Re: [e1000-devel] No versions for igb driver available on "modinfo igb" for SLE15-SP4

2024-01-18 Thread Jesse Brandeburg
On 1/11/2024 4:21 AM, kumar.mo...@swisscom.com wrote:
> Hi,
> 
> Unlike the previous releases for the drivers, we don’t see the column for 
> version anymore for the Intel(R) Gigabit Ethernet Network Driver in SUSE 
> Linux (SLE15-SP4) when doing modinfo igb.
> 
> Is this something expected and if yes, is there any other way to get the igb 
> driver versions? For older SUSE installations we have , we see 5.6.0-k or 
> something like that for igb driver versions.

Hi Mohit,

The Intel out-of-tree (OOT) drivers (like the one you download from
sourceforge) have a version number in them, but in the upstream, version
numbers were removed by the kernel community, and the version is
equivalent to the kernel the driver was released with.

If there was some reason you thought you needed a driver version, please
let us know.

The reason the kernel community removed the driver versions from
upstream (and therefore from consumers of upstream, like the SLES distro
you mention) is that the version numbers were misleading, wrong, or not
kept up to date. Basically the idea that comparing in-kernel to OOT
using a version number is not a good idea, as the drivers are not the
same, they're two different products released at different times, with
differing functionality.

If you need the specific upstream commit that igb was updated to in the
SLE15 SP4 release, please contact SuSE.

Hope this helps!
Jesse


___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
https://community.intel.com/t5/Ethernet-Products/bd-p/ethernet-products

Re: [e1000-devel] Intel E810 100Gb goes down sporadically

2023-12-05 Thread Jesse Brandeburg
On 12/3/2023 1:26 AM, Assaf Albo via E1000-devel wrote:
> Hello guys,
> 
> We are having constant network issues in production in that the link goes
> down, waits *exactly* 7-8 seconds, and goes up again.
> This can happen zero to a few times a day on all our servers; they are not
> in the same location and are connected to different network devices.
> 
> Each server runs as a KVM virtual machine with 60 CPUs (Pinning) and 224Gi
> (Huge pages) - overall performance is excellent.
> The NIC is PCI passed through to the KVM machine AS IS.
> OS Rocky Linux 8.5, kernel 4.18.0-348.23.1.el8_5.x86_64 with Intel ice
> 1.9.11 built and installed using rpm.
> We have a traffic generator between two servers (our app: client+server)
> that is reaching 94Gb and can replicate this issue.
> 
> The dmesg once the issue occur:
> Nov 28 16:01:27 SERVER kernel: ice :00:06.0 eth0: NIC Link is Down
> Nov 28 16:01:35 SERVER kernel: ice :00:06.0 eth0: NIC Link is up 100
> Gbps Full Duplex, Requested FEC: RS-FEC, Negotiated FEC: RS-FEC, Autoneg
> Advertised: Off, Autoneg Negotiated: False, Flow Control: None

Hi Assaf, sorry hear you're having problems.

w.r.t. the link down events we need to determine if it is a local down
or remote.

Please gather the 'ethtool -S eth0' statistics for a system that has had
some problems, and send to the list as text.

also, 'ethtool -m eth0'

The passthrough device shouldn't be any problem but I do recommend that
if you're passing through the device to a VM, you try to match the
destination PCIe function number to the origination ID to prevent odd
issues.

like if your host device is:
01:00.1 then (I'm not sure you can do this) I'd hope the VM device is
00:06.1, and not 00:06.0

So I guess with that statement I'd ask do you ever see the problem on
systems with
3b:00.0 (ice PF PCIe in host)
00:06.0 (ice PF in VM)

having the link down issues?

Please include output from devlink dev info, and if you know it, what
switch you're connected to.

Also, do you see any stats or events on the switch side when link is lost?

- Jesse


___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
https://community.intel.com/t5/Ethernet-Products/bd-p/ethernet-products


Re: [e1000-devel] idg driver compilation error on Ubuntu

2023-11-07 Thread Jesse Brandeburg
On 10/30/2023 3:27 AM, adelio ALVES wrote:

Thanks for your report!

Something happened to the content of your message when I released it to
the mailing list.

Please use the driver included in your kernel (igb.ko.xz or the like)
and let us know if you have any problems.

Was there a reason you wanted to run the out-of-tree igb-5.7.2 driver?

Kernel version
5.15.0-97-generic should already have a working igb driver.

Thanks,
Jesse



___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
https://community.intel.com/t5/Ethernet-Products/bd-p/ethernet-products


Re: [e1000-devel] Stablish uni-directional ethernet link

2023-07-28 Thread Jesse Brandeburg
On 7/28/2023 6:26 AM, Alireza Sadeghpour wrote:
> Hi, I am trying to establish a uni-directional Ethernet link where a
> singular fiber is used to transmit data to the receiver where both sides
> use ixgbe as driver. The Rx of the transmit side and the Tx of the receive
> side are not physically connected, like in a Data diode scenario. The
> problem is, as soon as I detach the tx line from one side, both side link
> status goes DOWN. is it possible to mask link status in the ixgbe driver to
> force it to be UP state in both side?

Yes, there is a force-link-up bit, called AUTOC.FLU.

You may have to set some other registers in AUTOC to force link speed, etc.

I'm pretty sure this will work as I've done it in the past, but your
mileage may vary and this is way outside normal for the linux driver, so
I can't help you much beyond this email.

If you still need help after trying the above, I recommend you contact
Intel Support.

Jesse


___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
https://community.intel.com/t5/Ethernet-Products/bd-p/ethernet-products


Re: [e1000-devel] Issue with Intel Corporation 82546EB dual port card on Ubuntu 22.04

2023-05-12 Thread Jesse Brandeburg
On 5/11/2023 9:54 PM, Igor Cicimov wrote:
> Hi,
> 
> I have a problem with my 8086:1010 Intel Corporation 82546EB Gigabit
> Ethernet Controller (Copper) dual port ethernet card and Ubuntu 22.04.2 LTS
> using e1000 driver:

This card is from 2003! :-) Nice that it's still running!



Did you file a bug with Canonical against ubuntu or ask for help over
there yet?

> that I have configured in LACP bond0:
> 
> # cat /proc/net/bonding/bond0
> Ethernet Channel Bonding Driver: v5.15.0-69-generic
> 
> Bonding Mode: IEEE 802.3ad Dynamic link aggregation
> Transmit Hash Policy: layer2+3 (2)
> MII Status: down
> MII Polling Interval (ms): 100
> Up Delay (ms): 100
> Down Delay (ms): 100
> Peer Notification Delay (ms): 0
> 
> 802.3ad info
> LACP active: on
> LACP rate: fast
> Min links: 0
> Aggregator selection policy (ad_select): stable
> System priority: 65535
> System MAC address: MAC_BOND0
> bond bond0 has no active aggregator

Did you try bonding without MII link monitoring? I'm wondering if you're
getting caught up in the ethtool transition to netlink for some reason.


> 
> Slave Interface: eth1
> MII Status: down
> Speed: 1000 Mbps
> Duplex: full
> Link Failure Count: 0
> Permanent HW addr: MAC_ETH1
> Slave queue ID: 0
> Aggregator ID: 1
> Actor Churn State: churned
> Partner Churn State: churned
> Actor Churned Count: 1
> Partner Churned Count: 1
> details actor lacp pdu:
> system priority: 65535
> system mac address: MAC_BOND0
> port key: 0
> port priority: 255
> port number: 1
> port state: 71
> details partner lacp pdu:
> system priority: 65535
> system mac address: 00:00:00:00:00:00
> oper key: 1
> port priority: 255
> port number: 1
> port state: 1
> 
> Slave Interface: eth2
> MII Status: down
> Speed: 1000 Mbps
> Duplex: full
> Link Failure Count: 0
> Permanent HW addr: MAC_ETH2
> Slave queue ID: 0
> Aggregator ID: 2
> Actor Churn State: churned
> Partner Churn State: churned
> Actor Churned Count: 1
> Partner Churned Count: 1
> details actor lacp pdu:
> system priority: 65535
> system mac address: MAC_BOND0
> port key: 0
> port priority: 255
> port number: 2
> port state: 71
> details partner lacp pdu:
> system priority: 65535
> system mac address: 00:00:00:00:00:00
> oper key: 1
> port priority: 255
> port number: 1
> port state: 1
> 
> that is in state down of course since both interfaces have MII Status:
> down. The dmesg shows:
> 
> # dmesg | grep -E "bond0|eth[1|2]"
> [   42.999281] e1000 :01:0a.0 eth1: (PCI:33MHz:32-bit) MAC_ETH1
> [   42.999292] e1000 :01:0a.0 eth1: Intel(R) PRO/1000 Network Connection
> [   43.323358] e1000 :01:0a.1 eth2: (PCI:33MHz:32-bit) MAC_ETH2
> [   43.323366] e1000 :01:0a.1 eth2: Intel(R) PRO/1000 Network Connection
> [   65.617020] bonding: bond0 is being created...
> [   65.787883] 8021q: adding VLAN 0 to HW filter on device eth1
> [   67.790638] 8021q: adding VLAN 0 to HW filter on device eth2
> [   70.094511] 8021q: adding VLAN 0 to HW filter on device bond0
> [   70.558364] 8021q: adding VLAN 0 to HW filter on device eth1
> [   70.558675] bond0: (slave eth1): Enslaving as a backup interface with a
> down link
> [   70.560050] 8021q: adding VLAN 0 to HW filter on device eth2
> [   70.560354] bond0: (slave eth2): Enslaving as a backup interface with a
> down link
> 
> So both eth1 and eth2 are UP and recognised, ethtool says "Link detected:
> yes" but their links are DOWN. I have a confusing port type of FIBRE
> reported by ethtool (capabilities reported by lshw are capabilities: pm
> pcix msi cap_list rom ethernet physical fibre 1000bt-fd autonegotiation).
> It is weird and I suspect some hardware or firmware issue. Any ideas are
> welcome.

You didn't post your bonding options enabled or bonding config file:

did you try the use_carrier=1 option, it's the default but you're not
setting it to zero are you??

> 
> P.S: It is not the switch or the switch ports and it is not the cables
> already tested that. The same setup, switch+cables+card was working fine up
> to Ubuntu 18.04

The
Supported ports: [ FIBRE ]

thing is strange, but it really shouldn't matter.





___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
https://community.intel.com/t5/Ethernet-Products/bd-p/ethernet-products


Re: [e1000-devel] error in e1000e driver under ubuntu20.04 HWE with kernel 5.15.0-67

2023-03-28 Thread Jesse Brandeburg
On 3/27/2023 12:32 AM, ST Cai wrote:
> OS:Ubuntu Server 20.04 HWE
> Kernel:5.15.0-67-generic
> Adapter:Intel(6)I219-V;Vendor:0x8086;Product:0x15be;
> Drivers :e1000e-3.8.4/e1000e-3.8.4;
> 
> make install Error:
> 
> ethtool.c:2838:19: error: initialization of ‘int (*)(struct net_device *,
> struct ethtool_coalesce *, struct kernel_ethtool_coalesce *, struct
> netlink_ext_ack *)’ from incompatible pointer type ‘int (*)(struct
> net_device *, struct ethtool_coalesce *)’
> [-Werror=incompatible-pointer-types]
> 
> ethtool.c:2838:19: note: (near initialization for
> ‘e1000_ethtool_ops.get_coalesce’)
> /home/egw/e1000e-3.8.7/src/ethtool.c:2839:19: error: initialization of ‘int
> (*)(struct net_device *, struct ethtool_coalesce *, struct
> kernel_ethtool_coalesce *, struct netlink_ext_ack *)’ from incompatible
> pointer type ‘int (*)(struct net_device *, struct ethtool_coalesce *)’
> [-Werror=incompatible-pointer-types]
> 
> How to solve?

The e1000e driver included with the kernel you already have should work
fine. Please try it and let us know.

The e1000e driver from this site is not being actively maintained and
was last released in 2020.





___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
https://community.intel.com/t5/Ethernet-Products/bd-p/ethernet-products

Re: [e1000-devel] missing symbols ice and i40e for irdma modules

2023-03-13 Thread Jesse Brandeburg
On 3/10/2023 12:25 AM, Dmitry Kravkov wrote:
> after loading ice 1.11.14 module which compiled for 6.1.8
> irdma modules is not able to load due to missing symbols:
> [1000969.082365] irdma: Unknown symbol ice_del_rdma_qset (err -2)
> [1000969.082599] irdma: Unknown symbol ice_add_rdma_qset (err -2)
> [1000969.082738] irdma: Unknown symbol ice_rdma_update_vsi_filter (err -2)
> [1000969.082856] irdma: Unknown symbol ice_rdma_request_reset (err -2)
> [1000969.082869] irdma: Unknown symbol ice_get_qos_params (err -2)
> 
> similar happens for i40e 2.22.18
> 
> 

We recommend you only ever run matched sets of drivers, does installing
the OOT (out of tree) irdma driver work?
https://www.intel.com/content/www/us/en/download/19632/linux-rdma-driver-for-the-e810-and-x722-intel-ethernet-controllers.html




___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
https://community.intel.com/t5/Ethernet-Products/bd-p/ethernet-products


Re: [e1000-devel] [ice] Intel E810-C one queue after reboot

2023-01-24 Thread Jesse Brandeburg

On 1/21/2023 6:17 AM, Highload Admin wrote:

Hello!


FYI, general spam filter guidance is to ignore mails from admin@ mails, 
so our list rejects your subscription.


I'd change it but we get a lot of mails from bogus admin@ accounts.


I have a problem with ice driver  (I tried 1.10.1.2 an1.10.1.2.2 versions).
After reboot  I see only one queue:
# ethtool -l enp129s0
Channel parameters for enp129s0:
Pre-set maximums:
RX: 1
TX: 1
Other:  0
Combined:   1
Current hardware settings:
RX: 0
TX: 0
Other:  0
Combined:   1


The above looks like "safe mode", which is a backup mode for the driver 
when something goes wrong during driver load, there should be messages 
in dmesg.



After reloading  driver ice - all fine? 128 queues
# ifdown  enp129s0; rmmod ice; modprobe ice; ifup enp129s0;
n# ethtool -l enp129s0


This is likely because you didn't do "make install" when originally 
building/installing the drivers, or it didn't work to modify your 
initramfs (which also contains drivers). This results in one file (an 
older driver) being loaded at boot, and post boot if you use 
rmmod/modprobe, the driver is loaded from your filesystem.



Channel parameters for enp129s0:
Pre-set maximums:
RX: 128
TX: 128
Other:  1
Combined:   128
Current hardware settings:
RX: 0
TX: 0
Other:  1
Combined:   128


This is good news, it means that if you get the driver installed 
correctly into your initramfs/initrd, things will be fine.



Hardware:
Supermicro H12DSU-iN motherboard
AMD EPYC 7742 64-Core Processor

Software:
OS Debian 10.13
Linux 4.19.0-23-amd64
# ethtool -i enp129s0
driver: ice
version: 1.10.1.2
firmware-version: 3.20 0x8000d855 1.3146.0
expansion-rom-version:
bus-info: :81:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

Howto resolve my problem?


Please try 'sudo make install' from the ice-1.0.10.2/src directory.

you can check that your problem is as I stated by using
'sudo lsinitrd | grep /ice.ko'

If you continue to have issues please get back to us.



___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
https://forums.intel.com/s/topic/0TO0P0018NbWAI/intel-ethernet


Re: [e1000-devel] i40e: Intel XL710: Linux Debian 11: rx/tx-vlan-offload seems to be broken since 2.18.9

2023-01-03 Thread Jesse Brandeburg

On 12/30/2022 11:15 AM, Andrey Kulikov wrote:

Hello,

I've got an Intel Fortville XL710-based Ethernet controller with 4 x 10GbE
SFP+ ports.
Platform is based on Intel Xeon CPU E5-2697 v4
Platform running Debian 11.6, kernel 5.10.0
# uname -a
Linux 5.10.0-20-amd64 #1 SMP Debian 5.10.158-2 (2022-12-13) x86_64 GNU/Linux

Current i40e driver: 2.22.8 (out-of tree, self-built from sources).


Are you loading the 8021q.ko module?


Setup: Two identical platforms, with absolutely identical hardware and
software. Connected directly with LC-LC SR patchcord using Intel 10G SFP+
transceivers (FTLX8571D3BCV-IT it makes any difference).

Issue: When I configure VLAN on HW interface - it doesn't work. When
pinging via VLAN the other side is just do not see anything (tcpdump shows
nothing).
At the same time, if I ping on HW interfaces directly - it does work
perfectly well.

But it was working with i40e driver 2.18.9 half of a year ago, with Debian
11.4(? here I could be wrong) kernel.

Relevant fragment from my /etc/network/interfaces on one side:

auto enp132s0f0
iface enp132s0f0 inet static
 address 192.168.33.2
 netmask 255.255.255.0
 mtu 1500

auto enp132s0f0.545
iface enp132s0f0.545 inet static
 address 192.168.44.2
 netmask 255.255.255.0
 mtu 1500


Once the network setup is done, what does 'ip link' show?


The other side looks identical except IP-addresses (they both end with '1').

Workaround: disable tx-vlan-offload and and rx-vlan-offload:

ethtool -K enp132s0f0 tx-vlan-offload off rx-vlan-offload off

Checked with CISCO NEXUS 7000 and NEXUS 9000 as remote counterparts - they
behave identically to described above.


Based on what you said I doubt it's switches or cables, but someting is 
up with your config.




Current XL710 firmware is 8.15. But I've got adapters with firmware
7.something - there is no difference in behavior.

Does it ring a bell?


I don't recall hearing reports of other issues.


Does it have something to do with the i40e driver?
Is any further information required?


When pinging, it would be useful to see what ethtool -S shows as 
changing, like

ethtool -S enp132s0f0
ping -c2 192.168.33.1

ethtool -S enp132s0f0

arp -an
output would be useful as well.




___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
https://forums.intel.com/s/topic/0TO0P0018NbWAI/intel-ethernet


Re: [E1000-devel] Driver issue 5.4.0-1064 Kernel

2022-11-04 Thread Jesse Brandeburg

On 11/3/2022 9:05 AM, Trey Hughes via E1000-devel wrote:
Good morning, I'm having an issue with installing the e1000e driver 
version 3.8.4 on Ubuntu 18.04 with Kernel 5.4.0-1064. When I go to make 
install per the readme instructions, I get an error stating the 
UTS_UBUNTU_RELEASE_ABI is too large.When I look at the code, it seems 
that it is checking to verify the release is >255, and if not it errors 
out on the install. Is there a way around this or is there another 
driver I should be using  for this kernel? Any help would be greatly 
appreciated. Thank you!Trey Hughes


Your kernel should already have an e1000e driver built-in, that works. 
If the in-kernel driver is not working then you should follow up with 
ubuntu bugzilla (but this is a pretty old release now)


also, it would be useful to know what hardware you're running, like 
output from lspci -nn





___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
https://forums.intel.com/s/topic/0TO0P0018NbWAI/intel-ethernet

Re: [E1000-devel] Citrix Hypervisor (XenServer) - very poor performance with X710

2022-03-14 Thread Jesse Brandeburg

On 3/14/2022 8:28 AM, Kevin Bowling wrote:

Fortville (700) has always been a bit of a disaster
(https://cdrdv2.intel.com/v1/dl/getContent/331430?explicitVersion=true),
I'd see if you can press your Intel reps into getting you the 550s or
the 800-series NICs for the unnecessary troubles it's a much nicer
design.

It's surprising they are shipping new cards with that old of a
firmware, you should be on 8.50 for the driver you are running
(https://www.intel.com/content/www/us/en/download/18635/non-volatile-memory-nvm-update-utility-for-intel-ethernet-adapters-700-series-linux.html).
Doing the FW update is worth a shot but most issues I've seen have
been driver related and you are running a pretty recent driver.

Regards,
Kevin

On Mon, Mar 14, 2022 at 7:44 AM Matthew Weiner
 wrote:


I'm at my wits end with this, Citrix is stumped, Dell is stumped, and with
the supply chain issues the way they are we can't just yank these out in
favor of X550s.  The problem is we have a group of Dell R740s with X710
dual-port NICs and the performance is, in a word, awful.  Like 5-6 megabit
upload and 250 megabit download awful.  However, identical server hardware
with any other card, be it a Broadcom or an Intel X550T, no issues.  We can
get line rate all day long.  The latest attempt was swapping the X710 for a
newer X710-T2L-t, which performed maybe 5-10 percent better.  We've tried
three different driver revisions, firmware, BIOS, all the available
Hypervisor updates, it still performs the same.

The servers in question have X550s on the motherboard mezzanine card which
perform fine, and a single dual-port X710 in the PCIe riser.  The X710 is
set up with an LACP pair trunked with three VLANs tagged across it.  In
this pool we also have servers with X550s on the PCIe cards, and
Broadcoms.  All those with an identical configuration perform without
issue, it's only the X710s that show this problem.


Hi Matt, sorry to hear about this problem. Let's poke a bit (please be 
patient with me) and see if we can help you.


Have you followed the steps like located here:
https://www.thomas-krenn.com/en/wiki/Intel_Ethernet_700_Series_LACP_Configuration

As there are definitely known problems with LACP mode and the driver's 
default settings.


You can try the above workaround and see if it helps. If that does help, 
then there are ways to make the settings get applied by ethtool as the 
system comes up.


Please let us know how it goes.

It would be helpful to know what kernel you're running, just for good 
measure.


PS. You may want to subscribe to e1000-devel as it is currently holding 
your messages because you're not a subscriber, and they have to be 
manually released.



___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
https://forums.intel.com/s/topic/0TO0P0018NbWAI/intel-ethernet


Re: [E1000-devel] errors on link xl 710

2021-11-23 Thread Jesse Brandeburg

On 11/22/2021 9:43 PM, Jakub Osuch wrote:

I have errors on link between nexus N3K-C3064PQ-10GX and LREC9902BF-2QSFP+.
driver: i40e
version: 2.17.4

Take a look:
shorturl.at/crCNQ



Hi Jakub, please file a bug at
https://sourceforge.net/p/e1000/bugs/

Which will allow you to attach relevant information. We're a little wary 
of clicking on random links, you can hopefully appreciate why.


Thanks





___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
https://forums.intel.com/s/topic/0TO0P0018NbWAI/intel-ethernet


Re: [E1000-devel] [Intel-wired-lan] Not able to create VFs on PF passthrough of ethernet interface to VM

2019-05-13 Thread Jesse Brandeburg
On Mon, 13 May 2019 07:36:41 + Periyasamy wrote:
> Hi,
> 
> I’m trying to achieve PF passthrough of 40/10G ethernet interface (i40e) into 
> guest VM running on qemu/kvm hypervisor and then create VFs on the PF inside 
> the VM.
> This is to have a flexibility and better manageability of VFs inside the VM 
> (for example, kubernetes worker node) itself and not on the host.
> 
> 
> The ethernet PCI device is seen inside the VM and bound to i40e driver. But I 
> don’t see an option to create VFs. i.e. sriov_numvfs file is not seen under 
> /sys/devices/pci:00/:00:02.1/:02:00.0 directory.

Hi Periyasamy,

The PCI space itself is not passed-through, it is completely fake and
generated by QEMU.

Do you know if anyone has ever gotten what you're trying to do to
work?  I don't think you can do what you're trying to do with using a
VM to spawn SR-IOV devices, at least I've not heard of it working.

Basically you have a scoping problem.  At it's core, the PCI space is
owned by the host, not the VM, and the hardware is literally in the
host PCI device space no matter where you pass it to.  The hardware
actually creates (starts decoding addresses and PCI space for) the new
PCI devices when you enable the device via sriov_numvfs.  Those devices
will appear in space reserved by the host, for SR-IOV devices to
"appear", but there is no guarantee that memory range will be passed
through to the VF, and again all the VM PCI devices are "fake" PCI
config space, so without some daemon monitoring and adding the devices
via virsh or something, I doubt the VM would ever see them even.


> Host versions:
> OS: Ubuntu 16.04.5 LTS, Kernel: 4.15.0-48-generic, libvirt: 4.0.0, qemu: 
> 2.11.1
> i40e version: 2.1.14-k, firmware-version: 6.01 0x800034a3 1.1747.0
> 
> Guest versions:
> OS: CentOS 7 (Core) Kernel: 3.10.0-862.14.4.el7.x86_64
> i40e version: 2.1.14-k, firmware-version: 6.01 0x800034a3 1.1747.0
> 
> The VM libvirt xml configuration [1], PF configuration at host [2], PF 
> configuration at VM [3] are attached.
> The lspci output line nos. 63-75 related to SRIOV Capabilities in host [2] 
> are missing in VM which looks bit weird.

as per above, the PCI config space is completely virtualized by QEMU.

Hope this helps!
Jesse


___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] jitter / latency reduction

2017-03-29 Thread Jesse Brandeburg
On Mon, 6 Mar 2017 08:09:42 -0800
Mahmood Qazen  wrote:

> greetings Leonardo
> this is the slide / pdf I found and towards the end it asks if we
> could help.
> enjoy
> Mahmood -

Hi developers, thanks for your interest, we’d love to have help, but the
good/bad news is that this is implemented already upstream, and known as
busy_poll support in the kernel.  Also, most if not all the active
drivers right now, at least from heavily used drivers, support the
“built-in” model that busy poll has migrated to.  This allows most if
not all drivers with NAPI support (normal) in the kernel to have
busy_poll support if it is enabled at runtime.  I believe there is
currently some work to do still to get epoll working correctly, and
there probably is room for refactoring/improvement to solve some of
the issues with scaling.

There is also a paper being presented next week at the NetDevConf.org
conference about Busy Polling, by Eric Dumazet from google, and videos
will be posted eventually.


Please see (in the linux kernel source) Documentation/sysctl/net.txt
busy_read

Low latency busy poll timeout for socket reads. (needs
CONFIG_NET_RX_BUSY_POLL) Approximate time in us to busy loop waiting
for packets on the device queue. This sets the default value of the
SO_BUSY_POLL socket option. Can be set or overridden per socket by
setting socket option SO_BUSY_POLL, which is the preferred method of
enabling. If you need to enable the feature globally via sysctl, a
value of 50 is recommended. Will increase power usage.
Default: 0 (off)

busy_poll

Low latency busy poll timeout for poll and select. (needs
CONFIG_NET_RX_BUSY_POLL) Approximate time in us to busy loop waiting
for events. Recommended value depends on the number of sockets you poll
on. For several sockets 50, for several hundreds 100.
For more than that you probably want to use epoll.
Note that only sockets with SO_BUSY_POLL set will be busy polled,
so you want to either selectively set SO_BUSY_POLL on those sockets or
set sysctl.net.busy_read globally.
Will increase power usage.
Default: 0 (off)


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] ixgbe port missing, "PCI INT B: failed to register GSI"

2016-12-07 Thread Jesse Brandeburg
On Tue, 6 Dec 2016 17:24:18 -0800
Ben Greear  wrote:

> On 12/06/2016 05:15 PM, Fujinaka, Todd wrote:
> > Attachments don't work here. You'll have to file a bug on sourceforge, or 
> > file an IPS for factory support (and tell me the number so it doesn't sit). 
> >  
> 
> Ok, here it is inline then.  lspci -vvv output is at the end of the dmesg 
> output.

Ben, please see if you can enable 64 bit BARs (in the BIOS) and the
issue might go away

also could try enabling the pci= option that allows the kernel to remap
BARs, but may not even be necessary if 64 bit works.

The reason I suggest the above is because I saw some BAR mapping errors
in your dmesg (which I believe is why you can't get MSI-X resources)

Sorry I didn't have more time to be specific, but I wanted to at least
get this out to you now.

--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/xeonphi
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] i40e: xl710 chipset in 4.8 kernel

2016-11-21 Thread Jesse Brandeburg
On Sat, 19 Nov 2016 23:05:57 +0300
Yavuz Selim Komur  wrote:

> Hi,
> 
> i40e drops all UDP traffic when upgrade to 4.8 from 4.7 linux kernel.
> 
> all DHCP, DNS traffic stop. i40e not forwards any UDP.
> 
> is this possible

Hi Yavuz, please be more specific, as there may be some reason for
your issue, but we can't tell from the data you provided.

output of:
ethtool -i
dmesg from boot
after a few minutes of "no UDP" please dump ethtool -S output
Are you plugged into a switch?

You should probably file a bug at https://e1000.sf.net/bugs so that you
can attach files, as they won't be delivered well by the list.

Also, are you using a distro based kernel, and what is the exact kernel
version you're using (output of uname -a)

Thanks!

--
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] [Intel-wired-lan] i40e card Tx resets

2016-03-18 Thread Jesse Brandeburg
On Thu, 17 Mar 2016 14:56:14 -0400
Sowmini Varadhan  wrote:

> On (03/17/16 10:20), zhuyj wrote:
> > 1. modprobe NET_PKTGEN
> > 
> > 2. download the tar file and uncompress to any directory.
> > This tar file is from kernel. It is in samples/pktgen/
> > 
> > 3. cd pktgen
> > 
> > 4. pktgen_sample02_multiqueue.sh -i ethx -s size -t cpu_number
> 
> Indeed, I see the same thing as you, and it was very easy to 
> reproduce. It was very interesting that the problem can happen with
> as few as 3 threads, at which point I see the TX hang at exactly
> -s 12305 

Okay, sorry I hadn't jumped into this thread yet.

I can uniquivically tell you that what Sowmini saw with the MDD with
stack based RDS-STRESS testing is *NOT* the same as what you're seeing
while using pktgen with invalid huge skb->data buffers.

We can ask on netdev if the driver should defend against this kind of
input to hard_start_xmit (transmit routine), but the driver doesn't
check the maximum length of the skb to see if it is invalid, because
the stack can never build (only pktgen can) these invalid SKBs.

The issue is that pktgen builds skb->data with a contiguous buffer of
whatever size transmit requested, (regardless of MTU) and then sends it
straight to the transmit routine, no segmentation flags, no MSS set.

This causes the driver to build a transmit descriptor with an invalid
length, which the hardware then "ASSERTS" on by issuing an MDD
interrupt and freezing the bad acting queue.

> I see:
> i40e :82:00.0: TX driver issue detected, PF reset issued
> i40e :82:00.0 eth2: VSI_seid 390, Hung TX queue 0, tx_pending: 492, 
> NTC:0x140, HWB: 0x140, NTU: 0x12c, TAIL: 0x12c
> 
> I think the common factor in both our test cases is that we have some
> kernel thread that can efficiently send packets without any context
> switches. 

You've found a red herring (mistakenly connected two separate events)
so I think you can stop going down this path (pktgen).

> Has anyone here seen this before? I'll see if I can find some cycles
> to figure this out, if not, maybe its worth bringing up on netdev,
> to see if others have seen this, and to draw some patterns.

we don't need to bring it up on netdev.  We have a way to troubleshoot
MDDs that I can send to you, if you want to do the work.  Otherwise we
need to have some time to reproduce here.

> > If size is set to a big number, the similar defect will occur.
> > Adjust this size to a appropriate number, my defect will not occur.
> > 
> > In the test, I found some types igb nic, such as i210, will work
> > well no matter the size is a big number.
> > some nic, such as 82580, it will not work well if the size is too big.

This is mostly a combination of driver implementation and how the
hardware handles a descriptor that is too large.  The driver *could*
check to make sure the skb->data is never too large, but in that same
vein, we *could* fix pktgen to never send a frame greater than MTU down
to the driver.

> > 
> > As such, I think my problem results from the hardware and the big
> > size triggers this problem.
> > 
> > I hope this can help us all.

Unfortunately Zhu's problem with pktgen is not a reproducer of
Sowmini's problem.

In the case of pktgen, it is a "don't do that, because it hurts" kind of
bug. In the case of rds-stress, we need to reproduce it here and figure
out what hardware constraint the driver is violating during set up of
the transmit.


--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231=/4140
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82574l GB network controller for WinCE 5.0

2015-08-26 Thread Jesse Brandeburg
On Tue, 25 Aug 2015 11:33:40 +0300
Eli Kedem eli.ke...@ardix.co.il wrote:

 I am developing an NDIS (v5.1) miniport device driver for the 82574l GB 
 network controller for WinCE 5.0. The main problem is that the BSP is
 very deficient and KITL does not work and the board does not have JTAG
 interface either. I have no way to debug the driver only by using
 RETAILMSG output to the hyperterminal. I managed to initialize the damn
 thing but I must have missed something because the miniport upper edge
 functions to handle interrupts like MiniportISR and
 MiniportHandleInterrupt are not called by the upper NDIS drivers, and I
 have no way to figure out why .  
 
 I   appreciate if someone has developed the same driver for 
 WinCE/WinXp/w7 and can send me the source code.

Hi Eli, I think you have the wrong list, as we cover open source
Linux issues here. I would be a bit surprised if Intel doesn't already
have a driver to do what you want, but you should contact your local
Field Application Engineer for Intel to check both a) if a driver
already exists, b) for help with driver design and if they can support
you.  Suggest you start with your local sales office.


--
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] i40e - Unqualified modules was detected

2015-08-21 Thread Jesse Brandeburg
On Fri, 21 Aug 2015 20:44:06 +
John McDowall jmcdow...@paloaltonetworks.com wrote:

 Hi,
 
 I am trying to get a dual 10G X7100 interface up and on boot I am seeing the 
 following message:
 
 
 i40e :04:00.0 p1p1: the driver failed to link because an unqualified 
 module was detected.
 
 [5.092641] IPv6: ADDRCONF(NETDEV_UP): p1p1: link is not ready
 
 [7.875628] i40e :04:00.1 p1p2: the driver failed to link because an 
 unqualified module was detected.

This is typically because you don't have an Intel validated module
plugged in.  What kind of media do you have plugged into p1p1/2?


 [7.875703] IPv6: ADDRCONF(NETDEV_UP): p1p2: link is not ready
 
 My system is a Dell R610, running CENTOS 7.0 I have upgraded the drivers and 
 the flash:

Thanks for doing that first, it helps.

ethtool p1p1 output would be useful.  Also, if you're using a fiber
module a picture of the module label and if you're using a direct
attach cable, a picture of the cable end label or the packaging it came
with.

 
 
 [root@localhost ~]# uname -a
 
 Linux localhost.localdomain 3.10.0-229.11.1.el7.x86_64 #1 SMP Thu Aug 6 
 01:06:18 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
 
 [root@localhost ~]# ethtool -i p1p1
 
 driver: i40e
 
 version: 1.3.38
 
 firmware-version: 4.53 0x80001dc0 0.0.0
 
 bus-info: :04:00.0
 
 supports-statistics: yes
 
 supports-test: yes
 
 supports-eeprom-access: yes
 
 supports-register-dump: yes
 
 supports-priv-flags: yes
 
 [root@localhost ~]#
 
 Any ideas of what could be wrong?
 
 Regards
 
 John McDowall
 
 
 
 
 


--
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] [net-next 8/9] i40e/i40evf: Bump i40e/i40evf version

2015-04-01 Thread Jesse Brandeburg
On Wed, 1 Apr 2015 11:52:37 +0200
Ronald van der Pol ronald.vander...@rvdp.org wrote:

 On Thu, Mar 26, 2015 at 23:01:46 -0700, Jeff Kirsher wrote:
 
  netdev is not the right mailing list for this question.
  Adding e1000-devel mailing list...
 
 Sorry about that. I still have problems with getting the Intel X4DACBL3
 QSFP to 4x SFP breakout cable working with the i40e. I have also upgraded
 the NVM, but it did not help. Below is the modprobe output.

I think the piece you're missing is that you need to run the QCU (QSFP+
Configuration Utility) to switch the port from 40G to 4x10 mode.

At that point the interface will show up as four physical functions
81:0.0 - 81:0.3


 
 [root@boron src]# ethtool -i ens5f0
 driver: i40e
 version: 1.2.37
 firmware-version: f4.33.31377 a1.2 n4.42 e1932

This is the right NVM to run QCU on top of.

 Apr  1 12:21:27 boron kernel: i40e: Intel(R) Ethernet Connection XL710 
 Network Driver - version 1.2.37

This is a good driver to be running. :-)

 PS I have a 3rd party QSFP-QSFP DAC cable + Intel X4DACBL3 inserted. Might
 the 3rd party DAC cable confuse the driver? I need to travel to get
 physical access to the server, so I cannot easily pull the cable.

I don't know, try QCU first.  The adapter and driver can't
automatically switch to 4x10 mode.

 PPS I understand:
 - 3rd party optics are not supported
 - max of 4 mac addresses, so 4x10 is OK, 4x10 + 1x40 is not OK
 Is this correct?

right, AFAIK

--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] i40e: trivial fixes

2014-12-02 Thread Jesse Brandeburg
On Tue, 2 Dec 2014 16:01:07 +0300
Dan Carpenter dan.carpen...@oracle.com wrote:

 Hello Jesse Brandeburg,
 
 The patch 895106a577c4: i40e: trivial fixes from Nov 26, 2013,
 leads to the following static checker warning:
 
   drivers/net/ethernet/intel/i40e/i40e_hmc.c:107 i40e_add_sd_table_entry()
   error: potentially using uninitialized 'ret_code'.
 

Thanks Dan for the report, we are looking into it.  Appreciate the
feedback!


--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] [patch net-next] i40e: remove dead fdb code

2014-11-20 Thread Jesse Brandeburg
On Thu, 20 Nov 2014 14:10:29 +0100
Jiri Pirko j...@resnulli.us wrote:

 This code is not used now and also it contains some weird ifdefs. So
 remove it for now. It can be added when needed.
 

First, thanks for looking at our code.

but, NAK, the code just needs to have the #ifdefs removed.

In addition the fdb_del and fdb_dump functions are un-necessary and
were submitted by mistake.

I will draft up a patch today and send it (and Jeff can take it through
Jeff Kirsher's i40e tree, if thats okay with DaveM)

Thanks, 
 Jesse

--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] [net-next v5 8/8] i40e: include i40e in kernel proper

2013-09-06 Thread Jesse Brandeburg
On Fri, 6 Sep 2013 14:01:41 -0400
David Miller da...@davemloft.net wrote:
 Please rename this Kbuild file to the normal Makefile instead of
 trying to be different from every single other driver in the
 networking for the sake of an issue that is your, and your problem
 alone.

Thanks Dave, will do, I'm preparing the patch now.

 You guys should really be grateful that anyone at all not being paid
 to do so is reviewing such a huge body of code for you, rather than
 complaining that all the issues weren't discovered the first time
 this series was posted.

We *are* really grateful for all the effort of any/all reviewers.  I
would like to personally thank you Dave, Joe Perches, Ben Hutchings,
and Stephen Hemminger for the non-trival amount of time spent on
reviewing this patch set.

--
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58041391iu=/4140/ostg.clktrk
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] [PATCH net-next] drivers:net: Convert dma_alloc_coherent(...__GFP_ZERO) to dma_zalloc_coherent

2013-08-27 Thread Jesse Brandeburg
On Mon, 26 Aug 2013 22:45:23 -0700
Joe Perches j...@perches.com wrote:

 __GFP_ZERO is an uncommon flag and perhaps is better
 not used.  static inline dma_zalloc_coherent exists
 so convert the uses of dma_alloc_coherent with __GFP_ZERO
 to the more common kernel style with zalloc.
 
 Remove memset from the static inline dma_zalloc_coherent
 and add just one use of __GFP_ZERO instead.
 
 Trivially reduces the size of the existing uses of
 dma_zalloc_coherent.
 
 Realign arguments as appropriate.
 
 Signed-off-by: Joe Perches j...@perches.com

e1000 and ixgb bits:

Acked-by: Jesse Brandeburg jesse.brandeb...@intel.com

--
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] e1000e on thinkpad x60: interrupt problem

2013-07-09 Thread Jesse Brandeburg
On Tue, 9 Jul 2013 22:48:54 +0200
Pavel Machek pa...@ucw.cz wrote:

 Yeah, of course you need to ask e1000e if it generated the
 interrupt. That part works. The part that actually generates the
 interrupt does not. Take a look at original mail...
 
 packet comes
 e1000e sets E1000_ICR_INT_ASSERTED bit
 e1000e tries to generate an interrupt and fails
 50msec passes

^^ thats the ASPM timeout length.

 AHCI generates interrupt
 all the handlers are called
 AHCI processes its interrupt, handles disk read
 e1000_intr notices E1000_ICR_INT_ASSERTED bit, delivers the packet.
 
 Network still works, only slowly. Ping goes lower when I use the
 disk. That matches what I see.
 
 Do you have other explanation?

Regardless of what others are saying I believe you have an issue with
ASPM being enabled.  All the discussion about shared interrupts, is
just a distraction.  This issue would still occur (and just be worse)
without a shared interrupt.

You already mentioned that a kernel hack to disable ASPM fixes it, but
you can just boot with different options to turn off ASPM.

pcie_aspm=off

There are known issues with ASPM on this part, and it definitely needs
to be off.  If your bios has the option to turn it off, that is the
best way to disable it, second choice is to turn it off using the
kernel option.

Hope this helps,
Jesse

--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] [RFC 0/7] Fixing dma mask setting in various network drivers

2013-06-11 Thread Jesse Brandeburg
On Tue, 11 Jun 2013 00:08:49 +0100
Russell King - ARM Linux li...@arm.linux.org.uk wrote:

 While looking at the way coherent DMA masks are handled (and the
 fact many drivers write directly to the mask) I stumbled across
 this set of oddities in various network drivers, which looks like
 it's been cut'n'pasted.
 
 I haven't yet tested these patches in any way, which is one reason
 I'm sending them out as an RFC.  The other reason is to find out
 if other people agree that these are indeed fixes.
 
  drivers/net/ethernet/brocade/bna/bnad.c   |7 +++
  drivers/net/ethernet/intel/e1000e/netdev.c|   11 +--
  drivers/net/ethernet/intel/igb/igb_main.c |   11 +--
  drivers/net/ethernet/intel/igbvf/netdev.c |   11 +--
  drivers/net/ethernet/intel/ixgb/ixgb_main.c   |9 -
  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   11 +--
  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |   11 +--
  7 files changed, 32 insertions(+), 39 deletions(-)

Thanks Russell,

The intel driver changes seem valid (we are testing them now).
According to DMA-API-HOWTO, the coherent mask will always succeed if
the regular mask succeeded, so the code can be further simplified as
well to basically match the example in DMA-API-HOWTO.

This is my proposed change to the intel drivers.  Comments?

+   if (!dma_set_mask(pdev-dev, DMA_BIT_MASK(64))) {
+   pci_using_dac = true;
+   /* coherent mask for the same size will always succeed if
+* dma_set_mask does
+*/
+   dma_set_coherent_mask(pdev-dev, DMA_BIT_MASK(64));
+   } else if (!dma_set_mask(pdev-dev, DMA_BIT_MASK(32))) {
+   pci_using_dac = false;
+   dma_set_coherent_mask(pdev-dev, DMA_BIT_MASK(32));
+   } else {
+   dev_err(pdev-dev, %s: DMA configuration failed: %d\n,
+__func__, err);
+   err = -EIO;
+   goto err_dma;
}


--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] [RFC 0/7] Fixing dma mask setting in various network drivers

2013-06-11 Thread Jesse Brandeburg
On Tue, 11 Jun 2013 13:35:05 -0700
Russell King - ARM Linux li...@arm.linux.org.uk wrote:
 As part of my review of all this stuff, I'm wondering whether a helper
 to set both masks makes sense.  Something like:
 
 static inline int dma_set_masks(struct device *dev, u64 mask)

it doesn't need to be inline, it is never called in hotpath.

 {
   int ret = dma_set_mask(dev, mask);
   if (ret == 0)
   dma_set_coherent_mask(dev, mask);
   return ret;
 }
 
 dma_set_masks() is a little too close to dma_set_mask() though; and

how about dma_set_mask_and_coherent(...)

 such a function looks like it would be usable for 20 odd drivers
 currently.  The plus point is that it may help to prevent this kind
 of issue in the future...
 
 Thoughts?

I really like the idea of consolidating this in the kernel with a
global helper.


--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Memory Corruption with e1000

2013-06-06 Thread Jesse Brandeburg
On Thu, 6 Jun 2013 09:38:50 -0700
Peter LaDow pet...@gocougs.wsu.edu wrote:

 On Thu, Jun 6, 2013 at 12:30 AM, Peter P Waskiewicz Jr
 peter.p.waskiewicz...@intel.com wrote:
  What about the pre-emption behavior of the kernel?  Namely Processor type
  and Features - Preemption Model.  Are you using no preemption, or forced
  preemption?
 
 Ok.  I've done testing.  Yes, we were building with PREEMPT_FULL.
 I've done some further testing and can re-create the problem on
 vanilla, non-preempt kernels.  See below.
 
 # uname -a
 Linux (none) 3.0.80-rt108 #2 Thu Jun 6 16:09:35 UTC 2013 ppc GNU/Linux
 
 And I still get the slab corruption leading up to the kernel panic:
 
 Slab corruption: size-2048 start=ee2b2070, len=2048
 Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
 Last user: [c0208514](skb_release_data+0xb4/0xc8)
 020: 6b 6b ff ff ff ff ff ff 00 0d ed 47 d9 87 81 00

that is quite clearly a broadcast, seems to me maybe a vlan packet
0x8100 to maybe vlan 0xf2?

so this means that the receive unit of the e1000 is not being stopped
completely (or is restarted by something) but that the memory of the DMA
buffer (the 2kB allocation) is being freed and then still DMA'd to.

 030: 00 f2 08 06 00 01 08 00 06 04 00 01 00 0d ed 47
 040: d9 87 0a f1 0a ea 00 00 00 00 00 00 0a f1 0a ea
 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 060: 00 00 09 81 d2 0f 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
 Next obj: start=ee2b2888, len=2048
 Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
 Last user: [c0209b8c](__netdev_alloc_skb+0x28/0x60)
 000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
 010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
 Slab corruption: size-2048 start=ed401480, len=2048
 Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
 Last user: [c0208514](skb_release_data+0xb4/0xc8)
 020: 6b 6b ff ff ff ff ff ff e0 db 55 e4 ce f9 08 00
 030: 45 00 01 3e 3e 1a 00 00 80 11 ca c0 0a ca 0d 42

same thing here, but this is an IP packet.

this is clearly a network adapter putting frames into memory that has
been freed.

I will see if someone here can reproduce this issue, but it seems quite
clear what is happening, we just need to figure out why.


 040: 0a ca 0d ff 00 8a 00 8a 01 2a a5 96 11 0e af 81
 050: 0a ca 0d 42 00 8a 01 14 00 00 20 45 42 45 4f 45
 060: 45 46 43 45 4c 45 50 45 44 45 49 45 4f 45 43 43
 070: 41 43 41 43 41 43 41 43 41 41 41 00 20 46 44 45
 Prev obj: start=ed400c68, len=2048
 Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
 Last user: [c0209b8c](__netdev_alloc_skb+0x28/0x60)
 000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
 010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
 Unable to handle kernel paging request for data at address 0x20454c45
 Faulting instruction address: 0xc0062498
 Oops: Kernel access of bad area, sig: 11 [#1]
 SEL35xx Platform
 Modules linked in:
 NIP: c0062498 LR: c02084d8 CTR: c000cbbc
 REGS: ee85bc60 TRAP: 0300   Not tainted  (3.0.80-rt108)
 MSR: 9032 EE,ME,IR,DR  CR: 24008248  XER: 
 DAR: 20454c45, DSISR: 2000
 TASK = ef3e5830[4616] 'ifconfig' THREAD: ee85a000
 GPR00:  ee85bd10 ef3e5830 20454c45 2d746baa 05f2 0002 
 GPR08: c03b14e4 ed7471a8 ee85bcd0 5c26  10087a48 bfe0e41c 10064ae4
 GPR16: 10064bc0 bfe0e40c  bfe0e3f4 0228  8914 c019a488
 GPR24: c019a9cc ed70f4b0 005c ed70f340 ef063120  0001 ee62bd30
 NIP [c0062498] put_page+0x0/0x34
 LR [c02084d8] skb_release_data+0x78/0xc8
 Call Trace:
 [ee85bd20] [c020810c] __kfree_skb+0x18/0xbc
 [ee85bd30] [c0195734] e1000_clean_rx_ring+0x10c/0x1a4
 [ee85bd60] [c01957f4] e1000_clean_all_rx_rings+0x28/0x54
 [ee85bd70] [c0198d40] e1000_close+0x30/0xb4
 [ee85bd90] [c0212408] __dev_close_many+0xa0/0xe0
 [ee85bda0] [c02141a0] __dev_close+0x2c/0x4c
 [ee85bdc0] [c0210a58] __dev_change_flags+0xb8/0x140
 [ee85bde0] [c0212324] dev_change_flags+0x1c/0x60
 [ee85be00] [c0267594] devinet_ioctl+0x2a4/0x700
 [ee85be60] [c026839c] inet_ioctl+0xc8/0xfc
 [ee85be70] [c02006d4] sock_ioctl+0x260/0x2a0
 [ee85be90] [c009145c] vfs_ioctl+0x2c/0x58
 [ee85bea0] [c0091bc8] do_vfs_ioctl+0x610/0x698
 [ee85bf10] [c0091ca8] sys_ioctl+0x58/0x88
 [ee85bf40] [c000e674] ret_from_syscall+0x0/0x38
 --- Exception: c01 at 0xff35a3c
 LR = 0xff359a0
 Instruction dump:
 419e0018 3c80c006 38630180 38842abc 38a0 4bfffe65 80010014 bbc10008
 38210010 7c0803a6 4e800020 4b54
 8003 7c691b78 700bc000 41a20008
 Kernel panic - not syncing: Fatal exception
 Call Trace:
 [ee85bb90] [c0007b80] show_stack+0x58/0x154 (unreliable)
 [ee85bbd0] [c001c3a8] panic+0xa8/0x1cc
 [ee85bc20] [c000b1f0] die+0x178/0x19c
 [ee85bc40] [c0011a44] bad_page_fault+0xe8/0xfc
 [ee85bc50] [c000eb14] handle_page_fault+0x7c/0x80
 --- Exception: 300 at put_page+0x0/0x34
 LR = skb_release_data+0x78/0xc8
 [ee85bd10] []   (null) (unreliable)
 [ee85bd20] [c020810c] __kfree_skb+0x18/0xbc
 [ee85bd30] [c0195734] e1000_clean_rx_ring+0x10c/0x1a4
 [ee85bd60] [c01957f4] e1000_clean_all_rx_rings+0x28/0x54
 

Re: [E1000-devel] Higher throughput at 100Mbps than 1Gbps

2013-05-21 Thread Jesse Brandeburg
On Tue, 21 May 2013 19:24:24 +0100
Sam Crawford samcrawf...@gmail.com wrote:
 To be clear, this doesn't just affect this one hosting provider - it seems
 to be common to all of our boxes. The issue only occurs when the sender is
 connected at 1Gbps, the RTT is reasonably high ( ~60ms), and we use TCP.
 
 By posting here I'm certainly not trying to suggest that the e1000e driver
 is at fault... I'm just running out of ideas and could really use some
 expert suggestions on where to look next!

I think you're overwhelming some intermediate buffers with send data
before they can drain, due to the burst send nature of TCP when
combined with TSO.  This is akin to bufferbloat.

Try turning off TSO using ethtool.  This will restore the native
feedback mechanisms of TCP.  You may also want to reduce or eliminate
the send side qdisc queueing (the default is 1000, but you probably
need a lot less), but I don't think it will help as much.

ethtool -K ethx tso off gso off

you may even want to turn GRO off at both ends, as GRO will be messing
with your feedback as well.

ethtool -K ethx gro off

I'm a bit surprised that this issue isn't being understood natively by
the linux stack.  That said GRO and TSO are really focused on LAN
traffic, not WAN.

Jesse

--
Try New Relic Now  We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app,  servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] [PATCH v2 net-next 0/4] net: low latency Ethernet device polling

2013-05-20 Thread Jesse Brandeburg
On Mon, May 20, 2013 at 1:09 PM, Jeff Kirsher
jeffrey.t.kirs...@intel.comwrote:

 On Sun, 2013-05-19 at 22:20 +0300, Eliezer Tamir wrote:
  On 19/05/2013 22:06, Or Gerlitz wrote:
   On Sun, May 19, 2013 at 1:25 PM, Eliezer Tamir
   eliezer.ta...@linux.intel.com wrote:
   This is an updated version of the code we posted on February.
  
   Last time you've placed a copy of the patchset in the rfc branch of
   git://github.com/jbrandeb/lls.git  - can you repost there V2 too?


the latest set (the v3 changes) were posted to
the rfcv2 branch on git://github.com/jbrandeb/lls.git
--
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] pci probe of 82574 fails

2013-01-29 Thread Jesse Brandeburg
On Tue, 29 Jan 2013 13:49:17 -0800
akepner akep...@riverbed.com wrote:

 On Fri, Jan 25, 2013 at 08:25:17PM +, Ronciak, John wrote:
  This could be BIOS configuration as well.  Check the BIOS version as Tushar 
  says but also look at how you have the device/slot configured in the BIOS.
  
 
 The probe of :07:00.0 failed with error -2 is seen 
 with only a few systems. All of the systems I've checked 
 (working, and non-working) are using the same BIOS version, 
 configured identically, and have hardware of the same type.  
 
 I instrumented the e1000e driver a bit and verified that 
 e1000_get_hw_semaphore_82574() call is where the error 
 is coming from. 
 
 We have some evidence that power cycling the system may 
 cause the bug to go away, but it's hard to be very confident 
 of that, since we've seen the bug so few times. 
 

If the semaphore acquisition fails you might be able to just force it
to 0, then reacquire it normally.  This is something we might be able
to consider adding to our code to work around these strange cases.

On the flipside whatever is acquiring it and not releasing it may still
continue to mess up.

--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] e1000e: ethtool -t fails when i/f is up

2012-11-16 Thread Jesse Brandeburg
On Fri, 16 Nov 2012 14:58:13 -0800 akepner akep...@riverbed.com wrote:

 With e1000e (versions 1.2.20, and 2.1.4) we've noticed that the 
 ethtool selftest fails with a miscompare when the interface is 
 up, but succeeds when it's down: 

Which hardware are you using?
# lspci -nn

what shows from:
# ethtool lan0_0 

is the cable plugged in while doing this? any output from dmesg? what
about if you up the messagelvl?

# ethtool -s lan0_0 msglvl 0x
# ethtool -t lan0_0

this may output too much info, not sure.

I just know that the maintainers of the driver will want this info.

--
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] [PATCH net] e1000e: Change wthresh to 1 to avoid possible Tx stalls.

2012-10-09 Thread Jesse Brandeburg
   Jesse did not share any performance numbers with me, I am sure he can
   give some background tomorrow when he is back online.
   
   I am working on an alternative patch now and should have something to
   share tomorrow.
  Please allow me to ask if there's any progess here?
  
  I've tried 3.5.4 a couple of days ago on a SuperMicro X8SIE-LN4 (82574L)
  and could still observe severe latency (up to 3000ms) spikes.
  
  Applying Hiroakis suggested patch did fix this for me as well.
  [please note as well that I didn't had this issue in any 3.4.x kernel
  before - so +1 for fixing the regression]

I'm not sure what went wrong internally here that this hasn't been
fixed, and I'm personally embarrassed.  I am working on it until I have
a patch/solution.

currently am trying to reproduce the issue, am in some weird how to
use BQL limbo, the lack of documentation on user usage of BQL is slowing
me down.

Hints or clues (I'm trying to follow the repro steps mentioned in
some related threads) are appreciated.

--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Intel 82546GB chip does not work with OpenVSwitch

2012-09-07 Thread Jesse Brandeburg
On Fri, 7 Sep 2012 12:37:04 +0200
Timm Essigke timm.essi...@uni-bayreuth.de wrote:
 I hope you can understand the cause of the problem from the ethregs 
 output included in the files.
 
 Thank you very much!

looks like the attachment(s) either wasn't included or didn't make it
through the list filters, can you upload to pastebin or email to me
directly?

A better option may be at this point to file a bug at our e1000.sf.net
bug tracker and the attachments can be put there.  I started making the
ticket for you but realized that you probably won't be able to attach
stuff unless you're the owner.


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] ioatdma 0000:00:0a.1: Channel halted, chanerr = 2

2012-04-30 Thread Jesse Brandeburg
On Mon, 30 Apr 2012 22:31:26 +
John Adams john.ad...@avid.com wrote:

 Dear e1000-devel,
 
 I'm wondering what kernel versions people are happily using in
 production with the ixgbe driver?
 
 I'm having network stability and performance issues with a 2.6.32-131
 modified Red Hat el6 on a quad core Xeon Jasper Forest cpu.  My nic is
 X520/82599 dual port.
 
 I wonder if this could be an ixgbe or ioatdma problem.
 Ixgbe is not mentioned in my stack traces.  Hoping for advice.
 
 I could try a later kernel, especially one recommended by a
 happy ixgbe user.

if you're having issues you could blacklist ioatdma.  It is really not
necessary, unless you were really benefiting from dca, which is
unlikely.

Someone should check if there are any bugzillas at redhat for ioatdma
 
 Any comment is much appreciated.
 
 Here's what I see. (just one cpu for brevity). This has been reported when 
 using an old version of
 ixgbe as well as 3.9.15-NAPI.
 
 ioatdma :00:0a.1: Channel halted, chanerr = 2
 ioatdma :00:0a.1: Channel halted, chanerr = 2
 ioatdma :00:0a.1: Channel halted, chanerr = 2
 ioatdma :00:0a.1: Channel halted, chanerr = 2
 ioatdma :00:0a.1: Channel halted, chanerr = 2
 ioatdma :00:0a.1: ioat2_timer_event: Channel halted (2)
 BUG: scheduling while atomic: process_name/6888/0x1301
 Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler sunrpc tcp_htcp 
 sr_mod cdrom raid456 async_raid6_recov async_pq raid6_pq async_xor xor 
 async_memcpy async_tx dm_mod ses enclosure sg i2c_i801 i2c_core iTCO_wdt 
 iTCO_vendor_support e1000e ioatdma ixgbe(U) dca pm8001(U) libsas 
 scsi_transport_sas ext3 jbd mbcache sd_mod crc_t10dif usb_storage pata_acpi 
 ata_generic ata_piix [last unloaded: scsi_wait_scan]
 Pid: 6888, comm: process_name Not tainted 2.6.32-foo-0 #7
 Call Trace:
  IRQ  [8104dab6] ? __schedule_bug+0x66/0x70
  [81477502] ? thread_return+0x5db/0x779
  [8104f05d] ? scheduler_tick+0xdd/0x280
  [810128e9] ? read_tsc+0x9/0x20
  [81090d03] ? ktime_get+0x63/0xe0
  [81029a2d] ? lapic_next_event+0x1d/0x30
  [a01c558c] ? ioat2_timer_event+0x25c/0x270 [ioatdma]
  [8105748a] ? __cond_resched+0x2a/0x40
  [a01c558c] ? ioat2_timer_event+0x25c/0x270 [ioatdma]
  [814777f0] ? _cond_resched+0x30/0x40
  [8100df96] ? is_valid_bugaddr+0x16/0x40
  [8124e4df] ? report_bug+0x1f/0xc0
  [8100f2af] ? die+0x7f/0x90
  [8147a184] ? do_trap+0xc4/0x160
  [a01c5330] ? ioat2_timer_event+0x0/0x270 [ioatdma]
  [a01c5330] ? ioat2_timer_event+0x0/0x270 [ioatdma]
  [8100ce55] ? do_invalid_op+0x95/0xb0
  [a01c558c] ? ioat2_timer_event+0x25c/0x270 [ioatdma]
  [8105ff11] ? vprintk+0x1d1/0x4f0
  [81028e89] ? native_send_call_func_single_ipi+0x39/0x40
  [8109c081] ? generic_exec_single+0xb1/0xc0
  [8100befb] ? invalid_op+0x1b/0x20
  [a01c5330] ? ioat2_timer_event+0x0/0x270 [ioatdma]
  [a01c558c] ? ioat2_timer_event+0x25c/0x270 [ioatdma]
  [a01c5579] ? ioat2_timer_event+0x249/0x270 [ioatdma]
  [810128e9] ? read_tsc+0x9/0x20
  [81071ea7] ? run_timer_softirq+0x197/0x340
  [810676a1] ? __do_softirq+0xc1/0x1d0
  [8100c26c] ? call_softirq+0x1c/0x30
  EOI  [8100dea5] ? do_softirq+0x65/0xa0
  [81067fe8] ? local_bh_enable_ip+0x98/0xa0
  [814798fb] ? _spin_unlock_bh+0x1b/0x20
  [a01c486f] ? ioat2_cleanup_tasklet+0x8f/0xa0 [ioatdma]
  [a01c3743] ? ioat2_is_complete+0x83/0xd0 [ioatdma]
  [8141c38f] ? tcp_recvmsg+0x75f/0xe90
  [81476f75] ? thread_return+0x4e/0x779
  [8143c55c] ? inet_recvmsg+0x5c/0x90
  [813d53b3] ? sock_recvmsg+0x133/0x160
  [81086100] ? autoremove_wake_function+0x0/0x40
  [8109810e] ? futex_wake+0x10e/0x120
  [8109a071] ? do_futex+0x121/0xb00
  [8104ed13] ? perf_event_task_sched_out+0x33/0x80
  [81168779] ? fget_light+0x9/0x90
  [813d570e] ? sys_recvfrom+0xee/0x180
  [810097ac] ? __switch_to+0x1ac/0x320
  [81476f75] ? thread_return+0x4e/0x779
  [8109aacb] ? sys_futex+0x7b/0x170
  [8100c5d5] ? math_state_restore+0x45/0x60
  [8100b132] ? system_call_fastpath+0x16/0x1b
 [ cut here ]
 kernel BUG at drivers/dma/ioat/dma_v2.c:315!
 
 In my sources that line is in ioat2_timer_event and it looks like it
 thinks a setup problem happened elsewhere.
 
 /* when halted due to errors check for channel
 * programming errors before advancing the completion state
 */
 if (is_ioat_halted(status)) {
 u32 chanerr;
 
 chanerr = readl(chan-reg_base + IOAT_CHANERR_OFFSET);
 dev_err(to_dev(chan), %s: Channel halted (%x)\n,
 __func__, chanerr);
 if (test_bit(IOAT_RUN, chan-state))
 BUG_ON(is_ioat_bug(chanerr));
 else /* we never got off the ground */
 return;
 }
 
 Thanks much,
 
 
 



Re: [E1000-devel] [PATCH RFC 0/2] e1000e: 82574 also needs ASPM L1 completely disabled

2012-04-23 Thread Jesse Brandeburg
On Mon, 23 Apr 2012 22:29:36 +0100
Chris Boot bo...@bootc.net wrote:
 Please note I haven't as-yet tested this code at all, but I do know that
 disabling ASPM L1 on these NICs (using setpci) fixes the hangs that I
 have been seeing on my Supermicro servers with X9SCL-F boards. I hope to
 get the chance to install an updated kernel on my two afftected servers
 later this week.
 
 Chris Boot (2):
   e1000e: Disable ASPM L1 on 82574
   e1000e: Remove special case for 82573/82574 ASPM L1 disablement

Thanks Chris, we are going to take a look over the patches and Jeff
Kirsher should apply them to our internal testing tree.

Please let us know the results of your testing, we will let you know if
we see any issues as well.

Jesse


signature.asc
Description: PGP signature
--
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] [PATCH] e1000e: MSI interrupt test failed, using legacy interrupt

2012-04-19 Thread Jesse Brandeburg
On Thu, 19 Apr 2012 10:59:47 -0700
Prasanna Panchamukhi ppanchamu...@riverbed.com wrote:

 On 04/19/2012 08:54 AM, Allan, Bruce W wrote:
  We have not seen a report of this issue before.  Please provide details on 
  the NIC or LOM and system/chipset on which the problem occurs and how the 
  additional 50ms was determined.
 
 This has been seen mostly with Intel 82571 Dual port Gigabit Ethernet 
 MAC+PHY of Intel Controller. Add-on NICs. Even 80ms works
 but to be safer I increased to 100ms. This issue has been seen when 
 multiple PCI-E add-on NICs with dual ports are inserted.

in what system?  The reason we are asking is that often just increasing
a delay like this will not solve all bugs in this path, without a root
cause it is difficult to justify the patch.

--
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] can't enable Flow Control for e1000e / 82572EI

2012-04-10 Thread Jesse Brandeburg
On Tue, 10 Apr 2012 14:54:02 +0200
Marko Kobal marko.ko...@arctur.si wrote:

 Hi,
 
 I have a Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) (rev 
 06) in my CentOS 5.7 (2.6.18-274.7.1.el5 x86_64) box.
 
 I have installed the latest drivers (e1000e-1.10.6.tar.gz) but can't enable 
 Flow Control:
 
 It seems like Flow Control is enabled when I load the driver, but after that 
 soon automatically disabled (?):
 
 load driver:
 
 # rmmod e1000e
 # modprobe e1000e
 # ethtool -a eth0
 Pause parameters for eth0:
 Autonegotiate:  on
 RX: on
 TX: on
 
 after 1 second:
 
 # ethtool -a eth0
 Pause parameters for eth0:
 Autonegotiate:  on
 RX: on
 TX: on
 
 after 5 seconds:

BTW this is after the typical 4 second autonegotiate link up for
gigabit.

 
 # ethtool -a eth0
 Pause parameters for eth0:
 Autonegotiate:  on
 RX: off
 TX: off
 
 (nothing in /var/log/messages)
 
 If I try to set it via ethtool
 
 # ethtool -A eth0 rx on tx on

I think our README covers this, but you need to do:
# ethtool -A eth0 autoneg off rx on tx on 

Your switch or link partner is advertising it doesn't support flow
control, so we are honoring it and turning it off.  You can override as
per the above, but you are probably not going to get the behavior you
want unless you have a network (subnet and switch) capable of flow
control, and have it on in your managed switch.

 # ethtool -a eth0
 Pause parameters for eth0:
 Autonegotiate:  on
 RX: off
 TX: off
 
 (I see e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: 
 None)
 
 I've even tried to force it as a parameter in /etc/modprobe.conf:
 options e1000e FlowControl=3
 
 but I get e1000e: Unknown parameter 'FlowControl' ...

The kernel driver you're using doesn't support FlowControl parameter,
and we generally expect ethtool to be used.

--
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Intel 82574L rx_short_length_errors

2012-04-03 Thread Jesse Brandeburg
On Tue, 3 Apr 2012 17:49:25 +0300
Aleksey Chudov aleksey.chu...@gmail.com wrote:
 I have few identical low end servers with the following integrated NICs:
 02:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network
 Connection
 03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network
 Connection

 On all servers is constantly increasing counters rx_errors, rx_length_errors
 and rx_short_length_errors. 
 # ethtool -S eth0
 NIC statistics:
  rx_packets: 441142143
  tx_packets: 640189607
  rx_bytes: 51863921636
  tx_bytes: 754569587969
  rx_broadcast: 158
  tx_broadcast: 1
  rx_errors: 13
  rx_length_errors: 13
  rx_short_length_errors: 13
  tx_tcp_seg_good: 106335289
  rx_long_byte_count: 51863921636
  rx_csum_offload_good: 441123474
  rx_csum_offload_errors: 4958


 I tried the following settings 82574L:
...
 It seems that the number of errors does not depend on configuration or
 driver version or the amount of traffic.

All the troubleshooting items listed above are typically for
different issues than you are having.

 Then I tied to insert in one server additional NIC 82576 connected to the
 same switch through the same patch cable and errors completely disappeared.

That indicates you might be having some issue at the physical
layer (the PHY) on the part.  What kind of switch, and what does
'ethtool eth0' report for the negotiated settings?

  Does anyone have any idea why there are errors with 82574L NIC ?

your error rate is extremely small, and in my experience most
internet connected machines have some amount of bogus input.
That said you didn't have issues with the 82576, which has a
completely diffferent PHY.

the rx_short_length errors are extremely unusual and are another
strong indication of a physical layer problem (bad cable,
something else gone wrong with LAN controller PHY, possibly
negotiation of tx or rx power having issues)

is the cable run very long?

snip

please give us the output of ethtool -e from one of your 82574L having
issues.

--
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Fw: Enable VF only on 2nd 82576 port

2012-03-30 Thread Jesse Brandeburg
Try 
modprobe igb max_vfs=-1,1

--
Spelling via autocorrect, please fogrive me

On Mar 30, 2012, at 3:10 AM, Jemma Jones jemmajone...@yahoo.co.uk wrote:

 
 
 If I load the driver with 
 
 modprobe igb max_vfs=0,1
 
 which would mean 0 VFs in PF 0 and 1 VF on PF 1 then I get an error saying 
 0,1 invalid for parameter max_vfs.
 
 So it's not working this way.
 
 
 
 
 
 From: Wyborny, Carolyn carolyn.wybo...@intel.com
 To: Jemma Jones jemmajone...@yahoo.co.uk; 
 e1000-devel@lists.sourceforge.net e1000-devel@lists.sourceforge.net 
 Sent: Thursday, 22 March 2012, 15:38
 Subject: RE: [E1000-devel] Enable VF only on 2nd 82576 port
 
 
 
 -Original Message-
 From: Jemma Jones [mailto:jemmajone...@yahoo.co.uk]
 Sent: Thursday, March 22, 2012 8:26 AM
 To: e1000-devel@lists.sourceforge.net
 Subject: [E1000-devel] Enable VF only on 2nd 82576 port
 
 Hi, I've got a 82576 car with 2 ports.
 
 
 They show up on my system as 2 physical functions:
 
 04:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network
 Connection (rev 01)
 04:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network
 Connection (rev 01)
 
 Now I would like to enable 1 VF on the 2nd port of the device.
 When I load the driver with
 
 modprobe igb
 max_vfs=1
 
 
 Then I get 1 VF at the first port.
 
 04:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function
 (rev 01)
 
 How can I load the driver to get 1 VF at the 2nd port?
 
 Cheers,
 Jemma
 
 You need to enter one parameter for each port:  modprobe igb max_vfs=1,1
 
 Thanks,
 
 Carolyn
 
 Carolyn Wyborny
 Linux Development
 LAN Access Division
 Intel Corporation
 --
 This SF email is sponsosred by:
 Try Windows Azure free for 90 days Click Here 
 http://p.sf.net/sfu/sfd2d-msazure
 ___
 E1000-devel mailing list
 E1000-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/e1000-devel
 To learn more about Intel#174; Ethernet, visit 
 http://communities.intel.com/community/wired

--
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Intel e1000e crashes on high throughput

2012-03-05 Thread Jesse Brandeburg
please use e1000-devel@lists.sourceforge.net on all future replies.



On Sun, 2012-03-04 at 23:42 -0500, Marcelo Pereira wrote:
 Hello,
 
 
 I have been struggling to use a NIC Intel e1000e, without success, for
 days!!
 
 
 I'm using the latest version of the driver (1.9.5), the kernel or the
 server is 2.6.18.
 
 
 It's goes up and works pretty well, until I need to do some heavy
 procedure (DRBD sync process, for example).
 
 
 The ethtool output doesn't say anything weird. However, all the
 sudden, I have a gazillion of error on the interface:

this is an error pattern we have likely seen before, but we need more
info before we can make suggestions.

you didn't mention any of the regular details we need.

lspci -vvv
ethtool -e eth2
dmidecode
full dmesg from boot

all these things should be attached to a new bug at
https://sourceforge.net/tracker/?group_id=42302atid=447449


 
 
 # ifconfig eth2
 eth2  Link encap:Ethernet  HWaddr 68:05:CA:01:F6:FF  
   inet addr:192.168.69.1  Bcast:192.168.69.255
  Mask:255.255.255.0
   inet6 addr: fe80::6a05:caff:fe01:f6ff/64 Scope:Link
   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
   RX packets:38562086 errors:9251359553430
 dropped:1541893258905 overruns:0 frame:6167573035620
   TX packets:141830787 errors:0 dropped:0 overruns:0 carrier:0
   collisions:0 txqueuelen:1000 
   RX bytes:3429005890 (3.1 GiB)  TX bytes:212307672425 (197.7
 GiB)
   Interrupt:177 Memory:c6fe-c700 
 
 
 # ethtool -S eth2
 NIC statistics:
  rx_packets: 206196987401
  tx_packets: 206300243141
  rx_bytes: 209741279230
  tx_bytes: 418498371679
  rx_broadcast: 206158430780
  tx_broadcast: 206158430233
  rx_multicast: 206158430196
  tx_multicast: 206158430206
  rx_errors: 1236950580960
  tx_errors: 0
  tx_dropped: 0
  multicast: 206158430196
  collisions: 0
  rx_length_errors: 412316860320
  rx_over_errors: 0
  rx_crc_errors: 206158430160
  rx_frame_errors: 206158430160
  rx_no_buffer_count: 206158430160
  rx_missed_errors: 206158430160
  tx_aborted_errors: 0
  tx_carrier_errors: 0
  tx_fifo_errors: 0
  tx_heartbeat_errors: 0
  tx_window_errors: 0
  tx_abort_late_coll: 0
  tx_deferred_ok: 0
  tx_single_coll_ok: 0
  tx_multi_coll_ok: 0
  tx_timeout_count: 0
  tx_restart_queue: 0
  rx_long_length_errors: 206158430160
  rx_short_length_errors: 206158430160
  rx_align_errors: 206158430160
  tx_tcp_seg_good: 206171367854
  tx_tcp_seg_failed: 206158430160
  rx_flow_control_xon: 206158430160
  rx_flow_control_xoff: 206158430160
  tx_flow_control_xon: 206158430160
  tx_flow_control_xoff: 206158430160
  rx_long_byte_count: 209741279230
  rx_csum_offload_good: 38561480
  rx_csum_offload_errors: 0
  rx_header_split: 0
  alloc_rx_buff_failed: 0
  tx_smbus: 206158430160
  rx_smbus: 206158430160
  dropped_smbus: 206158430160
  rx_dma_failed: 0
  tx_dma_failed: 0
 
 
 Just for the records, here is the ethtool's output, a couple of
 seconds before the crash:
 
 
 # ethtool -S eth2
 NIC statistics:
  rx_packets: 568137905
  tx_packets: 154624696
  rx_bytes: 849530810286
  tx_bytes: 14357782180
  rx_broadcast: 5193
  tx_broadcast: 387
  rx_multicast: 283
  tx_multicast: 102
  rx_errors: 0
  tx_errors: 0
  tx_dropped: 0
  multicast: 283
  collisions: 0
  rx_length_errors: 0
  rx_over_errors: 0
  rx_crc_errors: 0
  rx_frame_errors: 0
  rx_no_buffer_count: 0
  rx_missed_errors: 0
  tx_aborted_errors: 0
  tx_carrier_errors: 0
  tx_fifo_errors: 0
  tx_heartbeat_errors: 0
  tx_window_errors: 0
  tx_abort_late_coll: 0
  tx_deferred_ok: 0
  tx_single_coll_ok: 0
  tx_multi_coll_ok: 0
  tx_timeout_count: 0
  tx_restart_queue: 0
  rx_long_length_errors: 0
  rx_short_length_errors: 0
  rx_align_errors: 0
  tx_tcp_seg_good: 119928
  tx_tcp_seg_failed: 0
  rx_flow_control_xon: 0
  rx_flow_control_xoff: 0
  tx_flow_control_xon: 0
  tx_flow_control_xoff: 0
  rx_long_byte_count: 849530810286
  rx_csum_offload_good: 568133159
  rx_csum_offload_errors: 0
  rx_header_split: 0
  alloc_rx_buff_failed: 0
  tx_smbus: 0
  rx_smbus: 0
  dropped_smbus: 0
  rx_dma_failed: 0
  tx_dma_failed: 0
 
 
 I have already tried to turn auto negotiate on and off. I have set it
 up to use Flow Control (ethtool -A eth2 rx on tx on).
 
 
 The dmesg output says: e1000e: eth2 NIC Link is Up 1000 Mbps Full
 Duplex, Flow Control: None
 
 
 Several tests (more than 40min each). Several reboots (sometimes, it
 crashes so badly that it freeze the server, and I have to
 hard-reboot). Nothing can help me out with these NICs. I have two
 identical servers, and I just need them to communicate to 

Re: [E1000-devel] rx_csum_offload_errors counter questions

2012-02-10 Thread Jesse Brandeburg
On Fri, 2012-02-10 at 19:25 +0700, Bokhan Artem wrote:
 Any thoughts? May be somebody can point to description?
 
 On 09.02.2012 16:49, Bokhan Artem wrote:
  Hello.
 
  I have several questions about rx_csum_offload_errors counter for igb and 
  ixgbe
  drivers:
 
  What type of errors rx_csum_offload_errors counter consists of?

rx_csum_offload_errors count the number of error packets that the device
allows all the way to the receive routine.  L2 errors will be dropped in
the hardware by the receive filter, and counted in the IXGBE_CRCERRS
register, which is consolidated in the rx_errors counter in ifconfig.

 
  Does it count L2 or L3 errors?
This counter is mostly for L3 or L4 errors (IP checksum, TCP checksum)

 
  Does the driver pass packets with bad csums to OS?

yes, we don't mark the checksum as offloaded, and hand the packet to the
stack to recheck/account for/drop.

 
  Does the driver counts packets with bad csums which will be routed then?

it is unlikely that the packet will be routed as it will likely be
dropped by the upper layers.

Sorry for the delay!



--
Virtualization  Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] High number of rx_missed_errors when chaning from 1.0.2-k2 to 1.2.20-k2/1.5.1-k

2012-01-26 Thread Jesse Brandeburg
On Thu, 2012-01-26 at 02:06 -0800, Carsten Aulbert wrote:

 with 1.0.2-k2 and default options (except crcstripping=0) we get close to 120 
 MB/s and no dropped packets.
 
 rebooting the system to a kernel with a newer driver yields only 150-250kB/s 
 throughput and a packet drop-rate close to 20%..
 

Hi Carsten, it sounds to me like this might be related to ASPM, can you
try the boot option pcie_aspm=off

before you do that please capture the output of lspci -vvv and attach it
to the bug (or send it here I suppose)  also include ethtool -e ethX
output as an attachment, I'm interested to see some settings in your
eeprom.

 I'm attaching quite a number of files to this post, but would like to learn 
 how to find out, what's wrong and how to fix it.
 
 This error seemed to be popping up here and there on this list and elsewhere, 
 but so far I've yet to find a definite answer ...

as John said, rx_missed with no rx_no_buffer_count means that you're
dropping packets in hardware which typically means that something is
going wrong at the bus level or the PCIe transaction level, that ends up
delaying packets, due to long memory latencies or something like that
(just typical problems, not saying it is exactly your issue)

aspm is one of those causes, there can be others


--
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] ixgbe: Unsupported SFP+ modules on 10Gbit/s X520-DA2 NIC?

2012-01-18 Thread Jesse Brandeburg
On Wed, 18 Jan 2012 03:30:58 -0800
Jesper Dangaard Brouer h...@comx.dk wrote:
 I just bought three 10Gbit/s X520-DA2 NICs (82599 based) for
 production usage, but I cannot get them to accept any of my 10Gbit/s
 SFP+ modules (4 different tried). According to the documentation I can
 find, the X520-DA2 NIC should support fiber optics SFP+ modules.
Hi Jesper, 

For X520 adapters, the documentation[1] states that which SFP+
adapters are/are not supported.  Direct attach cables are also
supported.

[1] http://www.intel.com/support/network/adapter/pro100/sb/CS-030612.htm

 The SFP+ modules does work in another 82599 based NIC in the same
 machine (engineering sample from PJ).

Sorry, can't help you with that one, those samples are different
hardware.
 
Jesse

--
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] [PATCH net-next 2/2] igb: offer a PTP Hardware Clock instead of the timecompare method

2011-12-14 Thread Jesse Brandeburg
On Mon, 2011-12-12 at 19:00 -0800, Richard Cochran wrote:
 This commit removes the legacy timecompare code from the igb driver and
 offers a tunable PHC instead.
 
 Signed-off-by: Richard Cochran richardcoch...@gmail.com

Richard, first, thanks for this work, I have some feedback and request
you make a V2.

 -   /*
 -* The timestamp latches on lowest register read. For the 82580
 -* the lowest register is SYSTIMR instead of SYSTIML.  However we 
 never
 -* adjusted TIMINCA so SYSTIMR will just read as all 0s so ignore it.
 -*/

Please keep this comment in your new igb_82580_systim_read, it explains
a bit of *why* we are doing something.  There were a lot of explanatory
comments that you removed, please audit the - lines of your patch and
add back the comments that are appropriate in your new code. 

 -   if (hw-mac.type = e1000_82580) {
 -   stamp = rd32(E1000_SYSTIMR)  8;
 -   shift = IGB_82580_TSYNC_SHIFT;
 -   }
 -
 -   stamp |= (u64)rd32(E1000_SYSTIML)  shift;
 -   stamp |= (u64)rd32(E1000_SYSTIMH)  (shift + 32);
 -   return stamp;
 -}
 -
  /**
   * igb_get_hw_dev - return device
   * used by hardware layer to print debugging information
 @@ -2080,7 +2052,7 @@ static int __devinit igb_probe(struct pci_dev *pdev,
 
  #endif
 /* do hw tstamp init after resetting */
 -   igb_init_hw_timer(adapter);
 +   igb_ptp_init(adapter);
 
 dev_info(pdev-dev, Intel(R) Gigabit Ethernet Network 
 Connection\n);
 /* print bus type/speed/width info */
 @@ -2150,6 +2122,8 @@ static void __devexit igb_remove(struct pci_dev *pdev)
 struct igb_adapter *adapter = netdev_priv(netdev);
 struct e1000_hw *hw = adapter-hw;
 
 +   igb_ptp_remove(adapter);
 +
 /*
  * The watchdog timer may be rescheduled, so explicitly
  * disable watchdog from being rescheduled.
 @@ -2269,112 +2243,6 @@ out:
  }
 
  /**
 - * igb_init_hw_timer - Initialize hardware timer used with IEEE 1588 
 timestamp
 - * @adapter: board private structure to initialize
 - *
 - * igb_init_hw_timer initializes the function pointer and values for the hw
 - * timer found in hardware.
 - **/
 -static void igb_init_hw_timer(struct igb_adapter *adapter)
 -{
 -   struct e1000_hw *hw = adapter-hw;
 -
 -   switch (hw-mac.type) {
 -   case e1000_i350:
 -   case e1000_82580:
 -   memset(adapter-cycles, 0, sizeof(adapter-cycles));
 -   adapter-cycles.read = igb_read_clock;
 -   adapter-cycles.mask = CLOCKSOURCE_MASK(64);
 -   adapter-cycles.mult = 1;
 -   /*
 -* The 82580 timesync updates the system timer every 8ns by 
 8ns
 -* and the value cannot be shifted.  Instead we need to shift
 -* the registers to generate a 64bit timer value.  As a result
 -* SYSTIMR/L/H, TXSTMPL/H, RXSTMPL/H all have to be shifted by
 -* 24 in order to generate a larger value for synchronization.
 -*/
 -   adapter-cycles.shift = IGB_82580_TSYNC_SHIFT;
 -   /* disable system timer temporarily by setting bit 31 */
 -   wr32(E1000_TSAUXC, 0x8000);
 -   wrfl();
 -
 -   /* Set registers so that rollover occurs soon to test this. */
 -   wr32(E1000_SYSTIMR, 0x);
 -   wr32(E1000_SYSTIML, 0x8000);
 -   wr32(E1000_SYSTIMH, 0x00FF);
 -   wrfl();
 -
 -   /* enable system timer by clearing bit 31 */
 -   wr32(E1000_TSAUXC, 0x0);
 -   wrfl();
 -
 -   timecounter_init(adapter-clock,
 -adapter-cycles,
 -ktime_to_ns(ktime_get_real()));
 -   /*
 -* Synchronize our NIC clock against system wall clock. NIC
 -* time stamp reading requires ~3us per sample, each sample
 -* was pretty stable even under load = only require 10
 -* samples for each offset comparison.
 -*/
 -   memset(adapter-compare, 0, sizeof(adapter-compare));
 -   adapter-compare.source = adapter-clock;
 -   adapter-compare.target = ktime_get_real;
 -   adapter-compare.num_samples = 10;
 -   timecompare_update(adapter-compare, 0);
 -   break;
 -   case e1000_82576:
 -   /*
 -* Initialize hardware timer: we keep it running just in case
 -* that some program needs it later on.
 -*/
 -   memset(adapter-cycles, 0, sizeof(adapter-cycles));
 -   adapter-cycles.read = igb_read_clock;
 -   adapter-cycles.mask = CLOCKSOURCE_MASK(64);
 -   adapter-cycles.mult = 1;
 -   /**
 -* Scale the NIC clock cycle by a 

Re: [E1000-devel] interface counters

2011-12-13 Thread Jesse Brandeburg
On Tue, 2011-12-13 at 04:16 -0800, Bokhan Artem wrote:
 Hello!
 
 Is it possible to update interface counters more often then every 2 secs? 
 Probably with some changes of source.

yes it is possible and in fact several drivers do update a small set of
stats in real time, or when called via the update_stats entry point.

what driver were you curious about?


--
Systems Optimization Self Assessment
Improve efficiency and utilization of IT resources. Drive out cost and 
improve service delivery. Take 5 minutes to use this Systems Optimization 
Self Assessment. http://www.accelacomm.com/jaw/sdnl/114/51450054/
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] WARNING: at net/core/dev.c:1904 skb_gso_segment+0x146/0x298()

2011-12-05 Thread Jesse Brandeburg
cc: e1000-devel

On Wed, 23 Nov 2011 16:30:34 -0800
Paweł Staszewski pstaszew...@itcare.pl wrote:
 After upgrade from 2.6.38.2 to 3.1.2 i have this im dmesg:
 [  600.266497] WARNING: at net/core/dev.c:1904 skb_gso_segment+0x146/0x298()
 [  600.266500] Hardware name: X8DTU-6+
 [  600.266503] 802.1Q VLAN Support: caps=(0x20115833, 0x0) len=2816 
 data_len=2776 ip_summed=1

it seems no-one ever replied, can you give us more details about the
traffic and network configuration that reproduces the panic?

what does the output of 'ip address' show? vconfig?

it seems as if GRO is pushing a packet into the stack to be forwarded
that gso is mad about due to checksum != CHECKSUM_PARTIAL, esp when
stacked upon macvlan and/or vlan:

see dev.c:1904 in 3.1 kernel


 [  600.266506] Modules linked in: macvlan
 [  600.266511] Pid: 0, comm: kworker/0:1 Not tainted 3.1.2 #1
 [  600.266513] Call Trace:
 [  600.266515] IRQ  [8103449c] warn_slowpath_common+0x80/0x98
 [  600.266527]  [81034548] warn_slowpath_fmt+0x41/0x43
 [  600.266530]  [813c838f] skb_gso_segment+0x146/0x298
 [  600.266535]  [8103994e] ? local_bh_enable+0xd/0xf
 [  600.266540]  [813cc646] dev_hard_start_xmit+0x35a/0x57d
 [  600.266544]  [8103994e] ? local_bh_enable+0xd/0xf
 [  600.266548]  [813cccb2] dev_queue_xmit+0x449/0x4ef
 [  600.266554]  [813ffc4d] ip_finish_output2+0x1c4/0x201
 [  600.266560]  [813ffd1c] ip_finish_output+0x92/0x97
 [  600.266562]  [813ffe7c] T.1037+0x4f/0x56
 [  600.266565]  [8145] ip_output+0x58/0x5b
 [  600.266567]  [813fc4f0] ip_forward_finish+0x44/0x48
 [  600.266569]  [813fc7f4] ip_forward+0x300/0x36c
 [  600.266572]  [813fb144] ip_rcv_finish+0x2a4/0x2ce
 [  600.266575]  [813faea0] ? inet_del_protocol+0x37/0x37
 [  600.266577]  [813fb431] T.935+0x4c/0x53
 [  600.266579]  [813fb6bc] ip_rcv+0x237/0x263
 [  600.266582]  [813cb76b] __netif_receive_skb+0x41d/0x44f
 [  600.266584]  [813cb891] process_backlog+0xf4/0x1d3
 [  600.266587]  [813cbfee] net_rx_action+0x74/0x1cb
 [  600.266589]  [81039a74] __do_softirq+0xc8/0x1a4
 [  600.266591]  [81039b31] ? __do_softirq+0x185/0x1a4
 [  600.266595]  [814a8bec] call_softirq+0x1c/0x30
 [  600.266599]  [8100385d] do_softirq+0x41/0x7e
 [  600.266601]  [8103987b] irq_exit+0x44/0x74
 [  600.266603]  [81003182] do_IRQ+0x98/0xaf
 [  600.266606]  [814a0f2e] common_interrupt+0x6e/0x6e
 [  600.266608] EOI  [8100887e] ? mwait_idle+0x7e/0xa4
 [  600.266613]  [81008836] ? mwait_idle+0x36/0xa4
 [  600.266615]  [81001da7] cpu_idle+0x5f/0x91
 [  600.266620]  [81acca55] start_secondary+0x192/0x196
 [  600.266622] ---[ end trace 15512840060b2da9 ]---
 
 Network controller: Intel Corporation 82598EB 10-Gigabit AT CX4 Network 
 Connection (rev 01)
 ethtool -i eth4
 driver: ixgbe
 version: 3.4.8-k
 firmware-version: 1.12-2
 bus-info: :04:00.0
 
 ethtool -k eth4
 Offload parameters for eth4:
 rx-checksumming: on
 tx-checksumming: on
 scatter-gather: on
 tcp-segmentation-offload: on
 udp-fragmentation-offload: off
 generic-segmentation-offload: on
 generic-receive-offload: on
 large-receive-offload: off
 ntuple-filters: off
 receive-hashing: on


--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] [BUG] e1000: possible deadlock scenario caught by lockdep

2011-11-18 Thread Jesse Brandeburg
CC'd netdev, and e1000-devel

On Thu, 17 Nov 2011 17:27:00 -0800
Steven Rostedt rost...@goodmis.org wrote:

 I hit the following lockdep splat:
 
 ==
 [ INFO: possible circular locking dependency detected ]
 3.2.0-rc2-test+ #14
 ---
 reboot/2316 is trying to acquire lock:
  (((adapter-watchdog_task)-work)){+.+...}, at: [81069553] 
 wait_on_work+0x0/0xac
 
 but task is already holding lock:
  (adapter-mutex){+.+...}, at: [81359b1d] 
 __e1000_shutdown+0x56/0x1f5
 
 which lock already depends on the new lock.
 
 
 the existing dependency chain (in reverse order) is:
 
 - #1 (adapter-mutex){+.+...}:
[8108261a] lock_acquire+0x103/0x158
[8150bcf3] __mutex_lock_common+0x6a/0x441
[8150c13d] mutex_lock_nested+0x1b/0x1d
[81359288] e1000_watchdog+0x56/0x4a4
[8106a1b0] process_one_work+0x1ef/0x3e0
[8106b4e0] worker_thread+0xda/0x15e
[8106f00e] kthread+0x9f/0xa7
[81514e84] kernel_thread_helper+0x4/0x10
 
 - #0 (((adapter-watchdog_task)-work)){+.+...}:
[81081e4a] __lock_acquire+0xa29/0xd06
[8108261a] lock_acquire+0x103/0x158
[81069590] wait_on_work+0x3d/0xac
[8106a616] __cancel_work_timer+0xb9/0xff
[8106a66e] cancel_delayed_work_sync+0x12/0x14
[81355c8f] e1000_down_and_stop+0x2e/0x4a
[813581ed] e1000_down+0x116/0x176
[81359b4a] __e1000_shutdown+0x83/0x1f5
[81359cd6] e1000_shutdown+0x1a/0x43
[8126fdad] pci_device_shutdown+0x29/0x3d
[8130c601] device_shutdown+0xbe/0xf9
[81065b17] kernel_restart_prepare+0x31/0x38
[81065b32] kernel_restart+0x14/0x51
[81065cd8] sys_reboot+0x157/0x1b0
[81513882] system_call_fastpath+0x16/0x1b
 
 other info that might help us debug this:
 
  Possible unsafe locking scenario:
 
CPU0CPU1

   lock(adapter-mutex);
lock(((adapter-watchdog_task)-work));
lock(adapter-mutex);
   lock(((adapter-watchdog_task)-work));
 
  *** DEADLOCK ***
 
 2 locks held by reboot/2316:
  #0:  (reboot_mutex){+.+.+.}, at: [81065c20] sys_reboot+0x9f/0x1b0
  #1:  (adapter-mutex){+.+...}, at: [81359b1d] 
 __e1000_shutdown+0x56/0x1f5
 
 stack backtrace:
 Pid: 2316, comm: reboot Not tainted 3.2.0-rc2-test+ #14
 Call Trace:
  [81503eb2] print_circular_bug+0x1f8/0x209
  [81081e4a] __lock_acquire+0xa29/0xd06
  [81069553] ? wait_on_cpu_work+0x94/0x94
  [8108261a] lock_acquire+0x103/0x158
  [81069553] ? wait_on_cpu_work+0x94/0x94
  [810c7caf] ? trace_preempt_on+0x2a/0x2f
  [81069590] wait_on_work+0x3d/0xac
  [81069553] ? wait_on_cpu_work+0x94/0x94
  [8106a616] __cancel_work_timer+0xb9/0xff
  [8106a66e] cancel_delayed_work_sync+0x12/0x14
  [81355c8f] e1000_down_and_stop+0x2e/0x4a
  [813581ed] e1000_down+0x116/0x176
  [81359b4a] __e1000_shutdown+0x83/0x1f5
  [8150d51c] ? _raw_spin_unlock+0x33/0x56
  [8130c583] ? device_shutdown+0x40/0xf9
  [81359cd6] e1000_shutdown+0x1a/0x43
  [81510757] ? sub_preempt_count+0xa1/0xb4
  [8126fdad] pci_device_shutdown+0x29/0x3d
  [8130c601] device_shutdown+0xbe/0xf9
  [81065b17] kernel_restart_prepare+0x31/0x38
  [81065b32] kernel_restart+0x14/0x51
  [81065cd8] sys_reboot+0x157/0x1b0
  [81072ccb] ? hrtimer_cancel+0x17/0x24
  [8150c304] ? do_nanosleep+0x74/0xac
  [8125c72d] ? trace_hardirqs_off_thunk+0x3a/0x3c
  [8150e066] ? error_sti+0x5/0x6
  [810c7c80] ? time_hardirqs_off+0x2a/0x2f
  [8125c6ee] ? trace_hardirqs_on_thunk+0x3a/0x3f
  [8150db5d] ? retint_swapgs+0x13/0x1b
  [8150db5d] ? retint_swapgs+0x13/0x1b
  [81082a78] ? trace_hardirqs_on_caller+0x12d/0x164
  [810a74ce] ? audit_syscall_entry+0x11c/0x148
  [8125c6ee] ? trace_hardirqs_on_thunk+0x3a/0x3f
  [81513882] system_call_fastpath+0x16/0x1b
 
 
 The issue comes from two recent commits:
 
 commit a4010afef585b7142eb605e3a6e4210c0e1b2957
 Author: Jesse Brandeburg jesse.brandeb...@intel.com
 Date:   Wed Oct 5 07:24:41 2011 +
 e1000: convert hardware management from timers to threads
 
 and
 
 commit 0ef4eedc2e98edd51cd106e1f6a27178622b7e57
 Author: Jesse Brandeburg jesse.brandeb...@intel.com
 Date:   Wed Oct 5 07:24:51 2011 +
 e1000: convert to private mutex from rtnl
 
 
 What we have is on __e1000_shutdown():
 
   mutex_lock(adapter-mutex);
 
   if (netif_running(netdev)) {
   WARN_ON(test_bit(__E1000_RESETTING, adapter-flags

Re: [E1000-devel] [BUG] e1000: possible deadlock scenario caught by lockdep

2011-11-18 Thread Jesse Brandeburg
On Fri, 18 Nov 2011 08:57:37 -0800
Jesse Brandeburg jesse.brandeb...@intel.com wrote:

 CC'd netdev, and e1000-devel
 
 On Thu, 17 Nov 2011 17:27:00 -0800
 Steven Rostedt rost...@goodmis.org wrote:
  Here you see that we are calling 
  cancel_delayed_work_sync(adapter-watchdog_task);
  
  The problem is that adapter-watchdog_task grabs the mutex adapter-mutex.
  
  If the work has started and it blocked on that mutex, the
  cancel_delayed_work_sync() will block indefinitely and we have a
  deadlock.
  
  Not sure what's the best way around this. Can we call e1000_down()
  without grabbing the adapter-mutex?
 
 Thanks for the report, I'll look at it today and see if I can work out
 a way to avoid the bonk.

this is a proposed patch to fix the issue:
if it works for you please let me know and I will submit it officially
through our process

e1000: fix lockdep splat in shutdown handler

From: Jesse Brandeburg jesse.brandeb...@intel.com

as reported by Steven Rostedt, e1000 has a lockdep splat added
during the recent merge window.  The issue is that
cancel_delayed_work is called while holding our private mutex.

There is no reason that I can see to hold the mutex during pci
shutdown, it was more just paranoia that I put the mutex_lock
around the call to e1000_down.

in a quick survey lots of drivers handle locking differently when
being called by the pci layer.  The assumption here is that we
don't need the mutexes' protection in this function because
the driver could not be unloaded while in the shutdown handler
which is only called at reboot or poweroff.

Reported-by: Steven Rostedt rost...@goodmis.org
Signed-off-by: Jesse Brandeburg jesse.brandeb...@intel.com
---

 drivers/net/ethernet/intel/e1000/e1000_main.c |8 +---
 1 files changed, 1 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c 
b/drivers/net/ethernet/intel/e1000/e1000_main.c
index cf480b5..97b46ba 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -4716,8 +4716,6 @@ static int __e1000_shutdown(struct pci_dev *pdev, bool 
*enable_wake)
 
netif_device_detach(netdev);
 
-   mutex_lock(adapter-mutex);
-
if (netif_running(netdev)) {
WARN_ON(test_bit(__E1000_RESETTING, adapter-flags));
e1000_down(adapter);
@@ -4725,10 +4723,8 @@ static int __e1000_shutdown(struct pci_dev *pdev, bool 
*enable_wake)
 
 #ifdef CONFIG_PM
retval = pci_save_state(pdev);
-   if (retval) {
-   mutex_unlock(adapter-mutex);
+   if (retval)
return retval;
-   }
 #endif
 
status = er32(STATUS);
@@ -4783,8 +4779,6 @@ static int __e1000_shutdown(struct pci_dev *pdev, bool 
*enable_wake)
if (netif_running(netdev))
e1000_free_irq(adapter);
 
-   mutex_unlock(adapter-mutex);
-
pci_disable_device(pdev);
 
return 0;

--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang

2011-10-25 Thread Jesse Brandeburg
On Mon, 24 Oct 2011 23:29:34 -0700
Michael Wang wang...@linux.vnet.ibm.com wrote:
 May be you can just search macro
 E1000_TXDCTL_DMA_BURST_ENABLE
 in drivers/net/e1000e/e1000.h, change it to:
 
 #define E1000_TXDCTL_DMA_BURST_ENABLE \
 (E1000_TXDCTL_GRAN | /* set descriptor granularity */ \
 E1000_TXDCTL_COUNT_DESC | \
 (0  16) | /* wthresh must be +1 more than desired */\
 (1  8) | /* hthresh */ \
 0x1f) /* pthresh */
 
 this will do the write-back even only one has been done, if the
 problem solved, we can think about a good solution.

I can already tell you that this will fix the problem, but wthresh=1 is
more like the hardware default after reset I think.  Doing this will
prevent the bursting behavior that got us the performance improvement
this patch was made for, which is bad.

That is why we are looking at a solution that likely involves two
flush writes via the flush partial descriptors bits.  Just do the bit
31 set in TIDV and RDTR twice in a row and then make sure it is write
flushed.

If you wish to implement that and give it a try that would be useful
information.  We haven't had time yet to get a full repro going.


--
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning@Cisco Self-Assessment and learn 
about Cisco certifications, training, and career opportunities. 
http://p.sf.net/sfu/cisco-dev2dev
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang

2011-10-17 Thread Jesse Brandeburg
On Fri, 14 Oct 2011 10:04:26 -0700
Flavio Leitner f...@redhat.com wrote:

 
 Hi,
 
 I got few reports so far that 82571EB models are having the
 Detected Hardware Unit Hang issue after upgrading the kernel.
 
 Further debugging with an instrumented kernel revealed that the
 socket buffer time stamp matches with the last time e1000_xmit_frame()
 was called. Also that the time stamp of e1000_clean_tx_irq() last run
 is prior to the one in socket buffer.
 
 However, ~1 second later, an interrupt is fired and the old entry
 is found. Sometimes, the scheduled print_hang_task dumps the
 information _after_ the old entry is sent (shows empty ring),
 indicating that the HW TX unit isn't really stuck and apparently
 just missed the signal to initiate the transmission.
 
 Order of events:
  (1) skb is pushed down
  (2) e1000_xmit_frame() is called
  (3) ring is filled with one entry
  (4) TDT is updated
 (5) nothing happens for little more than 1 second
  (6) interrupt is fired
  (7) e1000_clean_tx_irq() is called
  (8) finds the entry not ready with an old time stamp,
  schedules print_hang_task and stops the TX queue.
  (9) print_hang_task runs, dump the info but the old entry is now sent
 (10) apparently the TX queue is back.

Flavio, thanks for the detailed info, please be sure to supply us the
bugzilla number.

TDH is probably not moving due to the writeback threshold settings in
TXDCTL.  netperf UDP_RR test is likely a good way to test this.

I don't think the sequence is quite what you said.  We are going to
work with the hardware team to get a sequence that works right, and we
should have a fix for you soon.

 
 The following commit seems to be related to the symptoms seen above:
 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3a3b75860527a11ba5035c6aa576079245d09e2a
 
  From: Jesse Brandeburg jesse.brandeb...@intel.com
  Date: Wed, 29 Sep 2010 21:38:49 + (+)
  Subject: e1000e: use hardware writeback batching
  X-Git-Tag: v2.6.37-rc1~147^2~299
  X-Git-Url:
 http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Ftorvalds%2Flinux-2.6.git;a=commitdiff_plain;h=3a3b75860527a11ba5035c6aa576079245d09e2a
  
 
  e1000e: use hardware writeback batching
 
  Most e1000e parts support batching writebacks.  The problem with this is
  that when some of the TADV or TIDV timers are not set, Tx can sit forever.
 
  This is solved in this patch with write flushes using the Flush Partial
  Descriptors (FPD) bit in TIDV and RDTR.
 
  This improves bus utilization and removes partial writes on e1000e,
  particularly from 82571 parts in S5500 chipset based machines.
 
  Only ES2LAN and 82571/2 parts are included in this optimization, to reduce
  testing load.
 
 We have modified the instrumented kernel to include the following patch
 disabling writeback batching feature to narrow down the problem:
 
 --- debug/drivers/net/e1000e/82571.c.orig  2011-10-11 14:00:44.0
 -0300
 +++ debug/drivers/net/e1000e/82571.c   2011-10-11 15:02:51.0 -0300
 @@ -2028,8 +2028,7 @@ struct e1000_info e1000_82571_info = {
  | FLAG_RESET_OVERWRITES_LAA /* errata */
  | FLAG_TARC_SPEED_MODE_BIT /* errata */
  | FLAG_APME_CHECK_PORT_B,
 -  .flags2 = FLAG2_DISABLE_ASPM_L1 /* errata 13 */
 -| FLAG2_DMA_BURST,
 +  .flags2 = FLAG2_DISABLE_ASPM_L1, /* errata 13 */
.pba= 38,
.max_hw_frame_size  = DEFAULT_JUMBO,
 
 
 and the customer confirmed that the issue has disappeared since then.
 
 Board info:
 1e:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet
 Controller (Copper) (rev 06)
 
 1e:00.0 0200: 8086:10bc (rev 06)
 Subsystem: 103c:704b
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
 Stepping- SERR- FastB2B- DisINTx+
 Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- 
 TAbort-
 MAbort- SERR- PERR- INTx-
 Latency: 0, Cache Line Size: 64 bytes
 Interrupt: pin B routed to IRQ 224
 Region 0: Memory at fd4e (32-bit, non-prefetchable) [size=128K]
 Region 1: Memory at fd40 (32-bit, non-prefetchable) [size=512K]
 Region 2: I/O ports at 7000 [size=32]
 Capabilities: [c8] Power Management version 2
 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
 PME(D0+,D1-,D2-,D3hot+,D3cold+)
 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
 Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
 Address: fee0  Data: 4073
 Capabilities: [e0] Express (v1) Endpoint, MSI 00
 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s 512ns,
 L1 64us
 ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
 DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+
 Unsupported

Re: [E1000-devel] 82574 DMA Burst Mode Enablement

2011-09-28 Thread Jesse Brandeburg
On Wed, 28 Sep 2011 11:39:54 -0700
Denis Radovanovic denis.radovano...@riverbed.com wrote:
 We are currently testing small packet performance on 82574, comparing
 it to 82571. Initial pktgen measurements have shown a significant
 difference in performance that is the most visible when running
 bidirectional traffic with 256 byte packets.
 
 Looking at the e1000e driver, we noticed that flag FLAG2_DMA_BURST is
 enabled for 82571 and 82572 but it is not enabled for 82574. After
 enabling the flag, the 82574 performance significantly improved,
 approaching the one on 82571.

At the time the feature was implemented we didn't have the bandwidth to
validate it on other parts besides 82571/2

As it stands, yes you can enable it, but there will likely be some bugs
that you will run into that we already know about but don't fully have
fixed in the code.  The bugs might result in tx hangs or other issues.
I do agree that there are significant performance gains to be had via
this feature, if the bugs can all be worked out.

if this is a feature that you would really like implemented please use
your Intel Field Agent or TME contacts  in order to document your requirement 
so we can consider it for future releases.

Thanks,
  Jesse

--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Possible to stop external IPMI/BMC access (port 623) by bringing iface up?

2011-09-28 Thread Jesse Brandeburg
On Wed, 28 Sep 2011 11:23:50 -0700
Carsten Aulbert carsten.aulb...@aei.mpg.de wrote:
 But now we reinstalled several machines with Debian Squeeze and
 suddenly we can only query the BMC when eth0 is down. The kernel we
 use is exactly the same (2.6.32.28 or 2.6.32.46 currently), i.e. same
 binary .deb package, same config, only the userland is changed.

This is probably the driver touching a register that prevents IPMI
traffic from flowing to the bmc.  It may be a patch that Debian made
that broke it, I don't generally track debian's forks of the kernel. :-)

Can you send the output from the ethregs tool before down/after down.

ethregs is available on e1000.sf.net in the downloads area.
 


--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] [PATCH net-next-2.6] e1000: don't enable dma receives until after dma address has been setup

2011-09-15 Thread Jesse Brandeburg
On Wed, 14 Sep 2011 17:31:38 -0700
Dean Nelson dnel...@redhat.com wrote:

 Doing an 'ifconfig ethN down' followed by an 'ifconfig ethN up' on a
 qemu-kvm guest system configured with two e1000 NICs can result in an
 'unable to handle kernel paging request at 0001' or 'bad
 page map in process ...' or something similar.

snip

 The corruption appears to result from the following...
 
  . An 'ifconfig ethN down' gets us into e1000_close(), which through
 a number of subfunctions results in:
  1. E1000_RCTL_EN being cleared in RCTL register.  [e1000_down()]
  2. dma_free_coherent() being called.  [e1000_free_rx_resources()]
 
  . An 'ifconfig ethN up' gets us into e1000_open(), which through a
 number of subfunctions results in:
  1. dma_alloc_coherent() being called.
 [e1000_setup_rx_resources()] 2. E1000_RCTL_EN being set in RCTL
 register.  [e1000_setup_rctl()] 3. E1000_RCTL_EN being cleared in
 RCTL register.  [e1000_configure_rx()] 4. RDLEN, RDBAH and RDBAL
 registers being set to reflect the dma page allocated in step 1.
 [e1000_configure_rx()] 5. E1000_RCTL_EN being set in RCTL register.
 [e1000_configure_rx()]
 
 During the 'ifconfig ethN up' there is a window opened, starting in
 step 2 where the receives are enabled up until they are disabled in
 step 3, in which the address of the receive descriptor dma page known
 by the NIC is still the previous one which was freed during the
 'ifconfig ethN down'. If this memory has been reallocated for some
 other use and the NIC feels so inclined, it will write to that former
 dma page with predictably unpleasant results.
 
 I realize that in the guest, we're dealing with an e1000 NIC that is
 software emulated by qemu-kvm. The problem doesn't appear to occur on
 bare-metal. Andy suspects that this is because in the emulator
 link-up is essentially instant and traffic can start flowing
 immediately. Whereas on bare-metal, link-up usually seems to take at
 least a few milliseconds. And this might be enough to prevent traffic
 from flowing into the device inside the window where E1000_RCTL_EN is
 set.

nice analysis dean, yes, we shouldn't enable rx before we have the
hardware all ready.

You didn't mention however that the hardware is reset in e1000_down,
which will clear the RDBAL/RDBAH in real hardware.

 
 So perhaps a modification needs to be made to the qemu-kvm e1000 NIC
 emulator to delay the link-up. But in defense of the emulator, it
 seems like a bad idea to enable dma operations before the address of
 the memory to be involved has been made known.

the hardware reset code in kvm should also reset to default many
registers (almost all of them in fact) which may also end up solving
the problem.

 
 The following patch no longer enables receives in e1000_setup_rctl()
 but leaves them however they were. It only enables receives in
 e1000_configure_rx(), and only after the dma address has been made
 known to the hardware.

I still like your patch better as it is more correct.  We could also
correct the kvm virtual hardware driver.

 There are two places where e1000_setup_rctl() gets called. The one in
 e1000_configure() is followed immediately by a call to
 e1000_configure_rx(), so there's really no change functionally
 (except for the removal of the problem window. The other is in
 __e1000_shutdown() and is not followed by a call to
 e1000_configure_rx(), so there is a change functionally. But
 consider...
 
  . An 'ifconfig ethN down' (just as described above).
 
  . A 'suspend' of the system, which (I'm assuming) will find its way
 into e1000_suspend() which calls __e1000_shutdown() resulting in:
  1. E1000_RCTL_EN being set in RCTL register.
 [e1000_setup_rctl()]
 
 And again we've re-opened the problem window for some unknown amount
 of time.
 
 Signed-off-by: Andy Gospodarek a...@greyhouse.net
 Signed-off-by: Dean Nelson dnel...@redhat.com
 
 ---
 The patch below is Andy's version of a patch I came up with to
 address this problem. I liked his version better. Functionally there
 was no difference between the two.
 
 Running my version of the patch, the reproducer (see script below)
 ran for 5 days without issue before I stopped it. Without the patch,
 former dma pages were corrupted in a very short timeframe and fairly
 frequently (relatively speaking). Note that I'm also running with a
 debug patch that after step 5 has completed (mentioned above under an
 'ifconfig ethN up'...), the previous dma page is scanned to see if it
 had been 'corrupted'. So I found a higher percentage of occurrences
 then one would find if one waits for a kernel BUG.
 
 The reproducer for this problem is:
 cat  reproducer.sh EOF
 #!/bin/bash
 typeset -i i=0
 echo eth1:down
 ifconfig eth1 down
 sleep 2
 while :; do
   i=$i+1
   ifconfig eth0 down ifconfig eth1 up
   echo $i | eth0:down eth1:up
   wait
   sleep 2
   ifconfig eth0 up ifconfig eth1 down
   echo $i | eth0:up eth1:down
   wait
   sleep 2
 done
 EOF
 
 The e1000e looks to have the 

Re: [E1000-devel] e1000e: NIC not working (after resume?)

2011-09-09 Thread Jesse Brandeburg
On Fri, Sep 9, 2011 at 6:43 AM, Frederik Himpe fhi...@telenet.be wrote:
 [Crossposting to e1000 mailing list]

 I have a Dell Latitude E6400 which has a network card supported by the
 e1000e driver. Often (I think after a suspend/resume cycle), the network
 card does not work at all: the NIC is correctly seen by ifconfig, but
 running ethtool just returns: No such device. dhclient -v gives the
 impression that it's correctly sending out DHCPDISCOVER packets on the
 NIC, but a tcpdump running on the same machine does not see any packets
 going out.

 I'm using Debian's 3.0.0-3 kernel (corresponding with Linux 3.0.3).

 Full lspci, .config and dmesg output at
 http://artipc10.vub.ac.be/~frederik/e1000e/

 Here is some relevant summary.

 How can I find out what is going wrong?

 # ifconfig eth0
 eth0      Link encap:Ethernet  HWaddr 00:21:70:e1:bb:4c
          UP BROADCAST PROMISC MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
          Interrupt:22 Memory:f6ae-f6b0


 # ethtool eth0
 Settings for eth0:
 Cannot get device settings: No such device
 Cannot get wake-on-lan settings: No such device
 Cannot get message level: No such device
 Cannot get link status: No such device
 No data available

 # lspci -vvnn
 00:19.0 Ethernet controller [0200]: Intel Corporation 82567LM Gigabit Network 
 Connection [8086:10f5] (rev 03)
        Subsystem: Dell Device [1028:0233]
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
 Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- 
 TAbort- MAbort- SERR- PERR- INTx-
        Interrupt: pin A routed to IRQ 22
        Region 0: Memory at f6ae (32-bit, non-prefetchable) [disabled] 
 [size=128K]
        Region 1: Memory at f6adb000 (32-bit, non-prefetchable) [disabled] 
 [size=4K]
        Region 2: I/O ports at efe0 [disabled] [size=32]
        Capabilities: [c8] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA 
 PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D3 NoSoftRst- PME-Enable+ DSel=0 DScale=1 PME+

The above is why nothing seems to work right.  The kernel runtime
power management thinks there is no link on the port, and so is
putting the port into D3.  We need to make sure that our driver wakes
the device to read link state (the problem is that wake from D3, plus
time to get link can take ~4 seconds).

        Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: fee0300c  Data: 4182
        Capabilities: [e0] PCI Advanced Features
                AFCap: TP+ FLR+
                AFCtrl: FLR-
                AFStatus: TP-
        Kernel driver in use: e1000e


 # dmesg | grep -E e1000e|eth0
 [    1.027437] e1000e: Intel(R) PRO/1000 Network Driver - 1.3.10-k2
 [    1.027441] e1000e: Copyright(c) 1999 - 2011 Intel Corporation.
 [    1.027480] e1000e :00:19.0: PCI INT A - GSI 22 (level, low) - IRQ 22
 [    1.027491] e1000e :00:19.0: setting latency timer to 64
 [    1.027605] e1000e :00:19.0: irq 43 for MSI/MSI-X
 [    1.231440] e1000e :00:19.0: eth0: (PCI Express:2.5GT/s:Width x1) 
 00:21:70:e1:bb:4c
 [    1.231444] e1000e :00:19.0: eth0: Intel(R) PRO/1000 Network Connection
 [    1.231470] e1000e :00:19.0: eth0: MAC: 7, PHY: 8, PBA No: 1004FF-0FF
 [   22.896268] e1000e :00:19.0: irq 43 for MSI/MSI-X
 [   22.952097] e1000e :00:19.0: irq 43 for MSI/MSI-X
 [   22.954132] ADDRCONF(NETDEV_UP): eth0: link is not ready
 [   24.504903] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow 
 Control: Rx/Tx
 [   24.506413] e1000e :00:19.0: eth0: 10/100 speed: disabling TSO
 [   24.508402] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
 [   34.788022] eth0: no IPv6 routers present
 [   41.444922] e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow 
 Control: Rx/Tx
 [   41.446325] e1000e :00:19.0: eth0: 10/100 speed: disabling TSO
 [25136.918393] e1000e :00:19.0: PME# enabled
 [25142.488050] e1000e :00:19.0: BAR 0: set to [mem 0xf6ae-0xf6af] 
 (PCI address [0xf6ae-0xf6af])
 [25142.488058] e1000e :00:19.0: BAR 1: set to [mem 0xf6adb000-0xf6adbfff] 
 (PCI address [0xf6adb000-0xf6adbfff])
 [25142.488066] e1000e :00:19.0: BAR 2: set to [io  0xefe0-0xefff] (PCI 
 address [0xefe0-0xefff])
 [25142.488085] e1000e :00:19.0: restoring config space at offset 0xf (was 
 0x100, writing 0x10a)
 [25142.488110] e1000e :00:19.0: restoring config space at offset 0x1 (was 
 0x10, writing 0x100107)
 [25142.488167] e1000e :00:19.0: PME# disabled
 [25142.510966] e1000e :00:19.0: PCI INT A disabled
 [25142.510971] e1000e :00:19.0: PME# enabled
 [25143.468668] e1000e :00:19.0: restoring config space at offset 0xf (was 
 0x100, writing 0x10a)

Re: [E1000-devel] vlan steering

2011-09-08 Thread Jesse Brandeburg
On Tue, 6 Sep 2011 02:19:44 -0700
bill4carson bill4car...@gmail.com wrote:

 Hi, guys
 
 
 Just a quick question about vlan steering,  does 82599 support this 
 feature?
 I didn't see any description about it in the 82576/82599 specification

I think what you're looking for is the VMDQ mode of the hardware, where
either VLAN id or MAC address selects which queue.

Our drivers currently don't implement this support fully.

 The bellow description is my understanding of vlan steering, correct
 me if I'm wrong about this concept.
 
   +- 
 Queue 1
   ++|
   packets from wire| |+- Queue 2
   -  | sort method| ---|
| |+- 
 Queue x
   ++|
   +- 

The picture is kinda hosed, but what happens is that a packet is
received by the hardware, the queue is picked based on hardware
configuration, and the packet is delivered to a descriptor in that
queue.

 Last queue
 sort method matters most.
 
 Vlan steering sort packets from wire based on the vlan ID in L2
 packet into different queues,
 this could relieve up layer protocol from the burden sorting by
 software.

We currently already offload the vlan ID (and strip it from the packet)
so there isn't a whole lot of offload overhead, AFAIK.

--
Doing More with Less: The Next Generation Virtual Desktop 
What are the key obstacles that have prevented many mid-market businesses
from deploying virtual desktops?   How do next-generation virtual desktops
provide companies an easier-to-deploy, easier-to-manage and more affordable
virtual desktop model.http://www.accelacomm.com/jaw/sfnl/114/51426474/
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] e1000e check_mng_mode issue

2011-08-24 Thread Jesse Brandeburg
On Tue, 23 Aug 2011 14:38:18 -0700
Andy Cress andy.cr...@us.kontron.com wrote:

 Tushar,
 
 Thanks for running this down.  So that means that the current driver
 implementation would never allow a NIC which has a BMC sideband
 connection physically to ever power off the PHY.
 
 That doesn't seem like the right approach.  I realize that it would be
 difficult to convey whether an IPMI session is active or not and it

I'm not an ipmi expert, but as I understand it, if the BMC firmware is
*running on our networking chip* then I think we could identify that.
If we are running as JUST the smbus transport layer, we just have a
simple wire interface for smbus, that can receive packets from an
external BMC at any time.  The NIC actually knows nothing about the
smbus connection except maybe that it *might* have added a MAC address
to our receive filter.

 may not be desirable to power down the PHY if there could be incoming
 IPMI management traffic, but there should be a way to detect if the
 IPMI configuration has the channel enabled or not.  If IPMI LAN is not
 enabled, the PHY can safely be powered down.  

In the case of an external BMC, this becomes really difficult to know
(I'm not sure it isn't possible, however)

 
 Obviously it would be simple to ignore these bits in the driver to get
 it to work, but that's not optimal.  
 
 If the BMC is asserting these bits without regard to the configuration
 of the IPMI LAN channels, perhaps that is where the bug could be
 pursued, to fix the firmware?  Or should the driver use another
 mechanism to discern whether IPMI LAN is enabled or not (KCS, ...)?

we have had zero luck getting external BMC vendors to update their
code.  One time we managed to get Intel to fix its external BMC.  The
lesson we've learned so far is changing BMCs is really really hard.

I think the bits are actually decided by the type of NVRAM image we
are loading, it is statically set at manufacturing time.

 
 Andy
 
 -Original Message-
 From: Dave, Tushar N [mailto:tushar.n.d...@intel.com] 
 Sent: Tuesday, August 23, 2011 5:07 PM
 To: Andy Cress; e1000-devel@lists.sourceforge.net
 Subject: RE: e1000e check_mng_mode issue
 
 Andy,
 These bits gets set by internal BMC firmware when the code is
 initialized.  It does not depend on whether there is currently an
 active IPMI session. 
 
 -Tushar
 
 -Original Message-
 From: Andy Cress [mailto:andy.cr...@us.kontron.com] 
 Sent: Friday, August 12, 2011 2:30 PM
 To: Dave, Tushar N; e1000-devel@lists.sourceforge.net
 Subject: RE: e1000e check_mng_mode issue
 
 Tushar,
 
 But if 'management mode' means that IPMI LAN is enabled or in use,
 then this indication is yielding a false result, because IPMI LAN is
 disabled.  Those bits are always set regardless of the state of the
 IPMI LAN configuration. 
 
 So what drives those bits?   Does the IPMI firmware drive them, or do
 they depend on the NIC firmware, or ...?  
 
 Andy
 
 -Original Message-
 From: Dave, Tushar N [mailto:tushar.n.d...@intel.com] 
 Sent: Friday, August 12, 2011 4:58 PM
 To: Andy Cress; e1000-devel@lists.sourceforge.net
 Subject: RE: e1000e check_mng_mode issue
 
 Andy,
 
 The define constant name (i.e. E1000_MNG_IAMT_MODE) is little
 confusing. The objective of e1000_check_mng_mode_generic() is to
 check if Management is enabled or not. It doesn't care about what MNG
 mode is enabled(e.g AMT or IPMI).
 
 If Management is enabled then FMSW (bit 3:1) should have value 0x3
 (This value is loaded from EEPROM word 13h). 
 So all e1000_check_mng_mode_generic() does is check if the FMSW's bit
 3:1 is equivalent to value 0x3.
 
 Let me know if you have any more queries.
 
 -Tushar
 
 
 -Original Message-
 From: Andy Cress [mailto:andy.cr...@us.kontron.com] 
 Sent: Friday, August 12, 2011 1:23 PM
 To: Dave, Tushar N; e1000-devel@lists.sourceforge.net
 Subject: RE: e1000e check_mng_mode issue
 
 Tushar,
 
 Right, my eth0 is 80003ES2LAN.   
 Attached is the 'ethtool -e eth0' output (eth0.e).
 
 Andy
 -Original Message-
 From: Dave, Tushar N [mailto:tushar.n.d...@intel.com]
 Sent: Friday, August 12, 2011 12:48 PM
 To: Andy Cress; e1000-devel@lists.sourceforge.net
 Subject: RE: e1000e check_mng_mode issue
 
 Andy,
 Thanks for your patience. I am looking into this.
 (assuming your eth0 device is 80003ES2LAN) Can you provide 'ethtool -e
 eth0' output?
 
 -Tushar
 
 
 -Original Message-
 From: Andy Cress [mailto:andy.cr...@us.kontron.com]
 Sent: Tuesday, August 09, 2011 2:21 PM
 To: e1000-devel@lists.sourceforge.net
 Subject: [E1000-devel] e1000e check_mng_mode issue
 
 
 This may apply to other NICs with an IPMI BMC instead of AMT, but
 here's my configuration:
 Baseboard: Intel S5000PAL
 Onboard NICs (2): 80003ES2LAN
 And this has an IPMI BMC on the baseboard with sideband connections to
 the onboard NICs.
 
 # ethtool -I eth0
 driver: e1000e
 version: 1.0.2.5-NAPI
 firmware-version: 1.0-0
 bus-info: :07:00.0
 
 For the e1000e driver, the 

Re: [E1000-devel] e1000e check_mng_mode issue

2011-08-24 Thread Jesse Brandeburg
On Wed, 24 Aug 2011 10:02:22 -0700
Andy Cress andy.cr...@us.kontron.com wrote:

 Thanks Jesse, that helps.
 
 The upshot of all this is that I believe we need an alternative way of
 detecting mng_mode for onboard IPMI BMCs, which could be implemented
 in a check_mng_mode_ipmi() routine perhaps.  The cases I'm most
 interested in are Intel S5000PAL and S5520UR motherboards
 (80003ES2LAN and 82575EB NICs, respectively).  
 
 Option 1: check_mng_mode_ipmi() I know that this information could be
 queried using a local KCS interface with the IPMI GetChannelAccess
 command, which should not be too difficult if the OpenIPMI driver is
 already there (#ifdef CONFIG_IPMI).  

what about something not so kernel based.  It seems to me all we are
missing is a user-space override that communicates to the driver yeah,
I know (as the administrator) that I'm not using IPMI, so port power
down is okay.

We could do this with a small driver enhancement possibly using ethtool
private flags that would allow the driver to override the
check_mng_mode_generic result, and power down the phy anyway.

This functionality in the driver would allow for a user space script to
query ipmi, make sure it was disabled, and then enable the override
via ethtool.

 Option 2: Allow a compile-time driver option to toggle whether or not
 the check_mng_mode_generic() returns according to the existing
 hard-coded bits, or returns a hard-coded zero (allowing those
 functions to occur), which would require the user to ensure that IPMI
 LAN had been disabled before exercising this. 
 
 Would you like me to take a stab at implementing option 1, or do you
 have a better idea? 
 
 Andy
 
 -Original Message-
 From: Jesse Brandeburg [mailto:jesse.brandeb...@intel.com] 
 Sent: Wednesday, August 24, 2011 12:25 PM
 To: Andy Cress
 Cc: Dave, Tushar N; e1000-devel@lists.sourceforge.net
 Subject: Re: [E1000-devel] e1000e check_mng_mode issue
 
 On Tue, 23 Aug 2011 14:38:18 -0700
 Andy Cress andy.cr...@us.kontron.com wrote:
 
  Tushar,
  
  Thanks for running this down.  So that means that the current driver
  implementation would never allow a NIC which has a BMC sideband
  connection physically to ever power off the PHY.
  
  That doesn't seem like the right approach.  I realize that it would
  be difficult to convey whether an IPMI session is active or not and
  it
 
 I'm not an ipmi expert, but as I understand it, if the BMC firmware is
 *running on our networking chip* then I think we could identify that.
 If we are running as JUST the smbus transport layer, we just have a
 simple wire interface for smbus, that can receive packets from an
 external BMC at any time.  The NIC actually knows nothing about the
 smbus connection except maybe that it *might* have added a MAC address
 to our receive filter.
 
  may not be desirable to power down the PHY if there could be
  incoming IPMI management traffic, but there should be a way to
  detect if the IPMI configuration has the channel enabled or not.
  If IPMI LAN is not enabled, the PHY can safely be powered down.  
 
 In the case of an external BMC, this becomes really difficult to know
 (I'm not sure it isn't possible, however)
 
  
  Obviously it would be simple to ignore these bits in the driver to
  get it to work, but that's not optimal.  
  
  If the BMC is asserting these bits without regard to the
  configuration of the IPMI LAN channels, perhaps that is where the
  bug could be pursued, to fix the firmware?  Or should the driver
  use another mechanism to discern whether IPMI LAN is enabled or not
  (KCS, ...)?
 
 we have had zero luck getting external BMC vendors to update their
 code.  One time we managed to get Intel to fix its external BMC.  The
 lesson we've learned so far is changing BMCs is really really hard.
 
 I think the bits are actually decided by the type of NVRAM image we
 are loading, it is statically set at manufacturing time.
 
  
  Andy
  
  -Original Message-
  From: Dave, Tushar N [mailto:tushar.n.d...@intel.com] 
  Sent: Tuesday, August 23, 2011 5:07 PM
  To: Andy Cress; e1000-devel@lists.sourceforge.net
  Subject: RE: e1000e check_mng_mode issue
  
  Andy,
  These bits gets set by internal BMC firmware when the code is
  initialized.  It does not depend on whether there is currently an
  active IPMI session. 
  
  -Tushar
  
  -Original Message-
  From: Andy Cress [mailto:andy.cr...@us.kontron.com] 
  Sent: Friday, August 12, 2011 2:30 PM
  To: Dave, Tushar N; e1000-devel@lists.sourceforge.net
  Subject: RE: e1000e check_mng_mode issue
  
  Tushar,
  
  But if 'management mode' means that IPMI LAN is enabled or in use,
  then this indication is yielding a false result, because IPMI LAN is
  disabled.  Those bits are always set regardless of the state of the
  IPMI LAN configuration. 
  
  So what drives those bits?   Does the IPMI firmware drive them, or
  do they depend on the NIC firmware, or ...?  
  
  Andy
  
  -Original Message-
  From: Dave

Re: [E1000-devel] Spam

2011-08-03 Thread Jesse Brandeburg
2011/8/1 CLOSE Dave dave.cl...@us.thalesgroup.com:
 I've tried asking privately to the owner of this list but have seen no
 response. Is there some reason why we can't filter this crap? Does
 anyone manage the list and remove offenders?

Hi Dave, yeah, sorry about the spam to this list, but we don't want to
make it a closed list because it is a community support mailing list,
and because of that we don't limit the traffic to members only.

Unfortunately sourceforge's spam control via mailman leaves a lot to
be desired and we've configured it as much as we can to block spam,
but it still lets ~5 or so spams a week through.  I hardly ever see
them however because my local spam filter (spambayes) catches them.

If you have any suggestions or anything else we can do let us know.
- Jesse

--
BlackBerryreg; DevCon Americas, Oct. 18-20, San Francisco, CA
The must-attend event for mobile developers. Connect with experts. 
Get tools for creating Super Apps. See the latest technologies.
Sessions, hands-on labs, demos  much more. Register early  save!
http://p.sf.net/sfu/rim-blackberry-1
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] non_eop_descs

2011-08-02 Thread Jesse Brandeburg
On Tue, Aug 2, 2011 at 6:12 PM, Richard Scobie rich...@sauce.co.nz wrote:
 I have a NAS box set up with bridged interfaces, a couple of which are
 82598EB AF.

 The host boxes direct attached to these are performing more or less
 identical tasks, but one interface shows no non_eop_descs and the other
 many.
snip

 What is this measuring and/or what causes them please?

this is a statistic to count how many times the hardware chained
buffers together in order to make a single frame for receive to the
host.  For instance, if every RX buffer is 2kB, and you have jumbos
enabled, OR, in the case of 82599, RSC (receive side coalescing - like
hardware LRO) then to receive 9kB you might need 5 * 2kB buffers.
each of the first four would not have EOP (end of packet) set, and the
5th would.

This is basically a debug statistic to help us developers have a
better picture of what kind of receives are being done by the
hardware/driver.

--
BlackBerryreg; DevCon Americas, Oct. 18-20, San Francisco, CA
The must-attend event for mobile developers. Connect with experts. 
Get tools for creating Super Apps. See the latest technologies.
Sessions, hands-on labs, demos  much more. Register early  save!
http://p.sf.net/sfu/rim-blackberry-1
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Getting information from Users

2011-06-23 Thread Jesse Brandeburg
On Thu, Jun 16, 2011 at 12:21 PM, Martin Owens docto...@gmail.com wrote:
 Hey devels,

 I'm updating my bug report with requested information and thought I
 might as well make a script to automatically pull all the information
 together.

This is awesome, thank you, I think that it needs some minor tweaks
however to make sure we get all relevant info.


 http://paste.ubuntu.com/628133/

 This script is very simple to use:

 sudo ./collect-info.sh ethX

 Then just post the tar.gz file containing all the required info, it
 submits only the relevant section of the dmesg log from the modprobe as
 well as taking care over selecting the driver the eth device is using.

 Should work for any eth driver, not just e1000e. It would be good if we
 could point users to the script when they try and report bugs. Please
 pass upstream to the linux-networking mailing list as required.

 Best Regards, Martin Owens


 --
 EditLive Enterprise is the world's most technically advanced content
 authoring tool. Experience the power of Track Changes, Inline Image
 Editing and ensure content is compliant with Accessibility Checking.
 http://p.sf.net/sfu/ephox-dev2dev
 ___
 E1000-devel mailing list
 E1000-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/e1000-devel
 To learn more about Intel#174; Ethernet, visit 
 http://communities.intel.com/community/wired


--
Simplify data backup and recovery for your virtual environment with vRanger.
Installation's a snap, and flexible recovery options mean your data is safe,
secure and there when you need it. Data protection magic?
Nope - It's vRanger. Get your free trial download today.
http://p.sf.net/sfu/quest-sfdev2dev
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Question on net_stats-rx_dropped setting to 0

2011-05-29 Thread Jesse Brandeburg
You have to overrun the fifo on the hardware to see rx_dropped error
from hardware.  Currently your cpu is fast enough to keep up with the
packet load.

On Wed, May 25, 2011 at 7:32 PM, Filo FeFi j11...@yahoo.com wrote:
 Ah!

 I've been looking for it in kernel version 2.6.18 which doesn't seem
 to have the function.  I should have mentioned it.

 In my test, I'm send many packets to the ixgbe:
 kernel 2.6.18
 ixgbe 3.3.9 w/o NAPI

 I'm seeing ixgbe's call to netif_rx() returning NET_RX_DROP, and it is
 incrementing adapter-rx_dropped_backlog.  However, this value isn't
 reported by ifconfig's rx dropped.

 I can see ixgbe_ethtool.c sends it to ethtool, so I can use that; also,
 as per Eric Dumazet's earlier email, I see that /proc/net/softnet_stat
 drop count being incremented in the netif_rx() function.

 But so far, I keep seeing 0 in ifconfig's RX dropped.  I'm wondering
 under what situation can I see something other than 0.

 Thanks,
 Ching


 --- On Wed, 5/25/11, Alexander Duyck alexander.h.du...@intel.com wrote:

 From: Alexander Duyck alexander.h.du...@intel.com
 Subject: Re: [E1000-devel] Question on net_stats-rx_dropped setting to 0
 To: Filo FeFi j11...@yahoo.com
 Cc: e1000-devel@lists.sourceforge.net e1000-devel@lists.sourceforge.net, 
 Skidmore, Donald C donald.c.skidm...@intel.com
 Date: Wednesday, May 25, 2011, 11:57 AM
 The function should be around line
 1500 in /net/core/dev.c of the Linux
 kernel.  I've included a link to it in lxr below.

 http://lxr.linux.no/#linux+v2.6.39/net/core/dev.c#L1498

 Thanks,

 Alex

 On 05/25/2011 02:41 PM, Filo FeFi wrote:
  Hi Don,
 
  Could you please elaborate a little on the
 dev_forward_skb() ?
  Where can I find that function?
 
  I was about to conclude that ixgbe always report 0
 for RX drop,
  but I would like to know the correct answer.
 
  Thanks,
  Ching
 
  --- On Mon, 5/23/11, Skidmore, Donald Cdonald.c.skidm...@intel.com
 wrote:
 
  From: Skidmore, Donald Cdonald.c.skidm...@intel.com
  Subject: RE: [E1000-devel] Question on
 net_stats-rx_dropped  setting to 0
  To: Filo FeFij11...@yahoo.com,
 e1000-devel@lists.sourceforge.nete1000-devel@lists.sourceforge.net
  Date: Monday, May 23, 2011, 5:55 PM
  Hi Ching,
 
  As you noted we (ixgbe) doesn't modify this value,
 other
  than initialing it to zero.  However
 elsewhere in the
  stack it is modified. One example being
 dev_forward_skb().
  So ixgbe devices may report rx_dropped as
 something other
  than 0.
 
  Thanks,
  -Don
 
  -Original Message-
  From: Filo FeFi [mailto:j11...@yahoo.com]
  Sent: Thursday, May 19, 2011 7:19 PM
  To: e1000-devel@lists.sourceforge.net
  Subject: [E1000-devel] Question on
  net_stats-rx_dropped setting to 0
  Dear ixgbe developers:
 
  I'm debugging a problem where some frames get
 dropped
  by the ixgbe
  driver (version 2.0.44-k2), i.e. /proc/net/dev
 drop
  is not 0.
  Reading the ixgbe-3.3.9/2.0.44.13/2.0.44.14
 source, I
  see the line
  (in ixgbe_main.c ixgbe_update_stats()):
  net_stats-rx_dropped = 0;
  So, does this mean that ixgbe always reports
 0 for RX
  dropped?
  Under what circumstances would /proc/net/dev's
 drop
  count for ixgbe
  be incremented/changed from 0?
 
  Thank you,
  Ching Tai
  (650) 506-1454
 
 
 
 
  --
  What Every C/C++ and Fortran developer Should
 Know!
  Read this article and learn how Intel has
 extended the
  reach of its
  next-generation tools to help Windows* and
 Linux* C/C++
  and Fortran
  developers boost performance applications -
 including
  clusters.
  http://p.sf.net/sfu/intel-dev2devmay
 
 ___
  E1000-devel mailing list
  E1000-devel@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/e1000-devel
  To learn more about Intel® Ethernet, visit
  http://communities.intel.com/community/wired
 
 --
  vRanger cuts backup time in half-while increasing
 security.
  With the market-leading solution for virtual backup
 and recovery,
  you get blazing-fast, flexible, and affordable data
 protection.
  Download your free trial now.
  http://p.sf.net/sfu/quest-d2dcopy1
  ___
  E1000-devel mailing list
  E1000-devel@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/e1000-devel
  To learn more about Intel® Ethernet, visit 
  http://communities.intel.com/community/wired



 --
 vRanger cuts backup time in half-while increasing security.
 With the market-leading solution for virtual backup and recovery,
 you get blazing-fast, flexible, and affordable data protection.
 Download your free trial now.
 http://p.sf.net/sfu/quest-d2dcopy1
 ___
 E1000-devel mailing list
 E1000-devel@lists.sourceforge.net
 

Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-01-31 Thread Jesse Brandeburg
On 1/31/2011 4:06 PM, Allan, Bruce W wrote:
 -Original Message-
 From: Nix [mailto:n...@esperi.org.uk]
 Sent: Monday, January 31, 2011 3:31 PM
 To: Allan, Bruce W
 Cc: e1000-devel@lists.sourceforge.net
 Subject: Re: [E1000-devel] 82754L spontaneous freeze networking woes 
 continue in
 2.6.37

 On 31 Jan 2011, Bruce W. Allan spake thusly:

 From: Nix [mailto:n...@esperi.org.uk]
 I'm not so sure anymore. In 2.6.35.4, everything works -- but in 2.6.35.4,
 the lspci output is *exactly the same*, i.e. even there lspci claims that
 ASPM L0s and L1 are enabled. This seems unlikely, since even if the L0s/L1
 state persists across a poweroff, the problem disappears upon a simple
 reboot into 2.6.35.4, and does not recur in that kernel release.

 Which kernel versions?  The above mentioned are all the same???

 Yes. 2.6.35.4..2.6.37 have no differences whatsoever in their lspci output
 for my 82574L cards.

 I am... confuzzled, but am happy to try turning L0s/L1 off (if I can
 figure out how to do it: setpci is... not the most friendly of tools
 and I've never even looked at its manpage before).
 
 ASPM is enabled/disabled via bits 1:0 of byte 16 in the Express Endpoint
 capability register.  First see what is in this byte with the following:
 
 # setpci -s domain]:]bus]:][slot][.[func]] CAP_EXP+10.b
 
 where domain]:]bus]:][slot][.[func]] is the slot information
 for your 82574.  I'm guessing that command will return 43 (hex) to indicate
 ASPM L0s (bit 0) and ASPM L1 (bit 1) are both enabled based on your previous
 lspci output.  Now, re-write the byte with bits 1:0 set to 10b (or 42 hex)
 to disable ASPM L0s:
 
 # setpci -s domain]:]bus]:][slot][.[func]] CAP_EXP+10.b=42
 
 or 00b (40 hex) to disable both ASPM L0s and L1:
 
 # setpci -s domain]:]bus]:][slot][.[func]] CAP_EXP+10.b=40
 
 and verify with 'lspci -vvv' that ASPM L0s [and L1] are disabled.

Please, for our benefit, file a bug at e1000.sf.net (if you have not
already) so you can attach the .config and full dmesg file from a
non-working kernel, also please attach the full lspci -vvv output.

The reason I'm asking for this is that the kernel may actually be
configured to not do aspm at all (CONFIG_ASPM=n), but it still is
helpful by printing strings like it did something[1]

[1] http://lxr.linux.no/linux+v2.6.37/include/linux/pci-aspm.h#L41

--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] NAPI in e1000e

2010-11-10 Thread Jesse Brandeburg
2010/11/1 xiaolin chan...@yeah.net:
 In e1000 driver, there is ew32(IMC, ~0) in the function of e1000_intr before 
 scheduling adapter-napi.

 However, there is no such kind operation in e1000e.

 My question is whether NIC hardware irq is disabled during the NAPI/ksoftirqd 
 processing?

yes, it is disabled by the IAM (auto-mask) register, when the
interrupt is asserted and the ICR register is read.

--
The Next 800 Companies to Lead America's Growth: New Video Whitepaper
David G. Thomson, author of the best-selling book Blueprint to a 
Billion shares his insights and actions to help propel your 
business during the next growth cycle. Listen Now!
http://p.sf.net/sfu/SAP-dev2dev
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] [PATCH] e1000e: Intel 82571EB: Don't wait for MNG cycle on unmanaged chips

2010-08-30 Thread Jesse Brandeburg
On Fri, Aug 27, 2010 at 12:10 PM, Kyle Moffett
kyle.d.moff...@boeing.com wrote:
 The Intel 82571EB chipset can be used in an unmanaged configuration as a
 fast dual-port Gig-E controller.  Unfortunately a board constructed that
 way would fail to correctly come up because the driver polls for the
 completion of a management cycle that will never occur.

 To resolve this problem, we disable the poll and error return on chips
 whose EEPROMs indicate no management.  As a protection against
 misconfigured chipsets, we still delay for the entire management poll
 timeout.

 Signed-off-by: Kyle Moffett kyle.d.moff...@boeing.com

Hi Kyle, thanks for submitting this patch.  Are you fixing this
problem for a device that is a LOM?  The reason I ask is that most if
not all of our current eeprom images require some firmware interaction
to correctly initialize the PHY when the part is reset, even for the
no_mng (no managability) case.

Your code below will avoid reading of and waiting for the cfg_done
bit, which means that the firmware could end up racing with the
driver, with them both trying to configure the part.

Was there a specific bug you were trying to fix, and can you reply (if
you want to me in private) with your ethtool -e ethX output?

The concern here is that you may simply have an out of date eeprom
image, which might fix the original issue and get the driver to work
correctly, as the behavior you are describing is not how it should
work according to our design.

At the very least we would like to reproduce your issue here so we can
investigate further.

Jesse

--
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] carrier detection issues at 10GB on XAUI with ixgbe driver on 2.6.27 x86 board

2010-05-18 Thread Jesse Brandeburg
On Tue, 2010-05-18 at 13:53 -0700, Chris Friesen wrote:
 I'm seeing some strange behaviour with an 82599 using XAUI at 10GB.
 Intermittently we get a scenario where it seems to get stuck in the
 following loop:
 
 link detected as up
 40-45ms delay
 link detected as down
 2 sec delay
 
 The following is a detailed timeline for one specific event.  Timestamps
 are in microseconds:
 
 762926422: link tests as down, LINKS register 0x34480100
 (100ms gap)
 763026224: audit detects link up, LINKS register 0x744bef80
 763026254: receive link state change interrupt via pci message, triggers
 watchdog to run (but it's already running)
 763026260: link up message printed to log stream
 763026383: link tests as up, LINKS register 0x744bef80
 (45ms gap)
 763071115: receive link state change interrupt via pci message
 763071134: link tests as down, LINKS register 0x34480f00
 763170935: link tests as down, LINKS register 0x34480100
 
 Basically, as far as I can tell the LINKS register values match what we
 would expect to see if the far end was going up and down.  However, the
 logs we have from the switch card (which admittedly don't give
 register-level information) don't show it bouncing the link up and down
 this fast.
 
 Any ideas what might be happening here?

None that immediately come to mind, I forwarded this to our hardware
engineering however to take a look.

 To save some time looking at the datasheet, the relevent bits in the
 LINKS register are interpreted as follows. For the 2nd and 3rd values
 I'll only give the deltas against the previous one.
 
 744bef80:
 link is up
 10G align status good
 10G lane sync status good all lanes
 signal detected on all four lanes of 10G parallel
 link status is up
 
 34480f00:
 link not up
 10G align status failed
 10g parallel lane sync status is failed
 link status is down
 
 34480100:
 signal detected only on lane 0 of 10G parallel (lanes 1/2/3 no signal
 detected)
 
 
 So basically we start out with no signal, then after 100ms we transition
 to the proper register values for a normal up link, then 45ms later we
 lose the alignment status (but still have good lane sync status), and
 finally 100ms later we lose signal detect on 3 of the 4 lanes.

Does the link eventually come up?  We may need to get an eeprom dump
from the 82599 part you're working on as well.

ethtool -e ethX 

should suffice.


-- 
Jesse Brandeburg
This email sent via Evolution, powered by Linux


--

___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Missing VLAN Header

2010-03-22 Thread Jesse Brandeburg
On Mon, Mar 22, 2010 at 7:57 AM, Andreas Grau
andreas.g...@ipvs.uni-stuttgart.de wrote:
 Hi,

 We are currently experimenting with vlan on a 10GE i82599 nic. Linux 2.6.18
 with ixgbe version 2.0.62.4-NAPI is used on top of XEN 3.1.2.

 For the experiments we are using the following scenario:

  -    ---          ---    -
  |  domU.1 |  |   dom0.1  |        |  dom0.2   |  |  domU.2 |
  |         |  |           |        |           |  |         |
  | 1.0.0.1 |  |           |        |           |  | 1.0.0.2 |
  | vlan100 |  |  bridge   |        |  bridge   |  | vlan100 |
  |    |    |  |  |    |   |        |  |    |   |  |    |    |
  |  eth0   |  | vif  eth0 |        | vif  eth0 |  |  eth0   |
  -    ---          ---    -
      |          |    |               |    |           |
       --      ---crossover---      ---
                          10GE-kable

There is a bug with respect to vlan header stripping (that is not
disabled correctly in promisc mode) and 82599, the fix is pretty
simple, but is not released yet.

 We now execute in domU.1 ping 1.0.0.2. Unfortunately the ping-request is not
 answered.

 Running on dom0.1 tcpdump -i eth0 gives (as expected):
 16:44:21 vlan 100, p 0, ARP, Request who-has 1.0.0.2 tell 1.0.0.1, length 28

 Running on dom0.2 tcpdump -i eth0 gives:
 16:44:21 ARP, Request who-has 1.0.0.2 tell 1.0.0.1, length 42

 For some reason the vlan header is removed? Could anyone tell me why?

 Cheers Andreas

 PS: Running the same scenario using another gigabit nic and the igb driver,
 everything works.

We'll have a release out soon with that fix.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] recent e100 fixes cause kernel panic?

2010-03-19 Thread Jesse Brandeburg
Added netdev, the place to talk about in-kernel driver problems.

On Thu, 2010-03-11 at 22:39 -0700, Stephen Hemminger wrote:
 - Ed Ravin era...@panix.com wrote:
 
  I'm using the Vyatta kenwood Linux distribution, which is currently
  at 2.6.31-1.  I upgraded to their latest version, and began seeing
  kernel
  panics shortly after starting to use ssh/scp on the network connected
  to
  an e100 NIC.  I was able to reproduce the problem immediately after
  booting up - sometimes it even crashed during the boot.
  
  One of the crash logs is attached.

Ed, thanks for the report, looks like these patches introduced a new
problem.  e100 hardware has a tricky data structure that seems to cause
some problems for (particularly arm) some cpu architectures.

  
  Since the problem seemed to be related to e100.c, I reverted the two
  commits to e100.c that had taken place since I last built the kernel
  for this box:
  
Author: Roger Oksanen roger.oksa...@cs.helsinki.fi
Date:   Fri Dec 18 20:18:21 2009 -0800
e100: Fix broken cbs accounting due to missing memset.
  
Author: Roger Oksanen roger.oksa...@cs.helsinki.fi
Date:   Sun Nov 29 17:17:29 2009 -0800
e100: Use pci pool to work around GFP_ATOMIC order 5 memory
  allocation failu
  
  I rebuilt the kernel and it's not panicking anymore.

so you just reverted both, and its good news things are working again,
but can you try one or the other and let us know if things still break
for you?

 The Vyatta kernel for 2.6.31 is based on the 2.6.31.10 + unionfs.
 These two patches came from the 2.6.31.10 -stable update.

This is the only report of this issue I have heard so far, so something
must be a little unique to your system or workload such that the driver
works mostly.

I'm looking more closely into the panic trace now, maybe I can figure it
out from there.

-- 
Jesse Brandeburg
This email sent via Evolution, powered by Linux


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Intel 2598EB 10-Gigabit AT dropped rx packet

2010-03-19 Thread Jesse Brandeburg
://www.ge.infn.it
  
  --
 
 
  
  --
  Se tutto sembra venirti incontro, probabilmente sei nella corsia
  sbagliata.
  
  --
 
  --
  --
  Mirko Corosu
  Network and system administrator
  Computing Center
  Istituto Nazionale Fisica Nucleare
  Via Dodecaneso 33
  16146 Genova, Italy
  http://www.ge.infn.it
  --
 
  --
  Se tutto sembra venirti incontro, probabilmente sei nella corsia sbagliata.
  --
 
 
 
  ---
  ---
  Download Intel#174; Parallel Studio Eval
  Try the new software tools for yourself. Speed compiling, find bugs
  proactively, and fine-tune applications for parallel performance.
  See why Intel Parallel Studio got high marks during beta.
  http://p.sf.net/sfu/intel-sw-dev
  ___
  E1000-devel mailing list
  E1000-devel@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/e1000-devel
  To learn more about Intel#174; Ethernet, visit
  http://communities.intel.com/community/wired
 
 
 --
 --
 Mirko Corosu
 Network and system administrator
 Computing Center
 Istituto Nazionale Fisica Nucleare
 Via Dodecaneso 33
 16146 Genova, Italy
 http://www.ge.infn.it
 --
 
 --
 Se tutto sembra venirti incontro, probabilmente sei nella corsia sbagliata.
 --
 
 
 --
 Download Intel#174; Parallel Studio Eval
 Try the new software tools for yourself. Speed compiling, find bugs
 proactively, and fine-tune applications for parallel performance.
 See why Intel Parallel Studio got high marks during beta.
 http://p.sf.net/sfu/intel-sw-dev
 ___
 E1000-devel mailing list
 E1000-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/e1000-devel
 To learn more about Intel#174; Ethernet, visit 
 http://communities.intel.com/community/wired

-- 
Jesse Brandeburg
This email sent via Evolution, powered by Linux


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] e1000_clean_tx_irq: Detected Tx Unit Hang

2010-03-04 Thread Jesse Brandeburg
On Thu, Mar 4, 2010 at 3:19 AM, Metal Thrashing Mad
thrash.d...@gmail.com wrote:
 Just read the mail from Nikita, about fixeep-82573-dspd.sh.

I didn't see that mail.  That script is only for 82573.

 Running the script returns -
 No appropriate hardware found for this fixup.

 Knowing full well that doing the following could render my card
 useless, void the warranty  I modified the script to return true
 for the model I have.

okay, but why?

 Running iperf with a total of 128 inbound connections with a -t of
 6000 a few times has not broke anything. Looks like this script may
 have fixed things. Iptraf was showing consistent 80,xxx kbit/s

did this test usually fail before?


 Here's an eeprom dump (after the script was ran)

 ethtool -e eth0
 Offset          Values
 --          --
 0x          00 0e 0c c2 82 04 10 02 ff ff 00 10 ff ff ff ff
 0x0010          60 d2 03 00 0b 64 76 14 86 80 7c 10 86 80 85 b2

so the 0x85 1 bytes from the end changed from 0x84 when you ran that script.
looking in the handy dandy manual for your 82541 posted at
sourceforge, EEPROM address map section,
I see that bit you changed is for uh, word, 0xF, 0xb284 became 0xb285
(aka bit 0)
bit zero is: reserved
looking into our internal documentation, that bit really shouldn't be
doing anything if you are at 1Gb/s link.

My guess is you're going to see the problem again.

 0x0020          dd 20 55 55 00 00 90 2f 00 32 12 00 20 1e 12 00
 0x0030          20 1e 12 00 20 1e 12 00 20 1e 09 00 00 02 00 00
 0x0040          0c 00 a6 93 0b 28 00 00 00 04 ff ff ff ff ff ff
 0x0050          ff ff ff ff ff ff ff ff ff ff ff ff ff ff 02 06
 0x0060          00 01 00 40 16 12 07 40 ff ff ff ff ff ff ff ff
 0x0070          ff ff ff ff ff ff ff ff ff ff ff ff ff ff d4 19

 If that doesn't show up correctly http://pastebin.ca/1822468

 Here's an ethtool -S

 ethtool -S eth0
 NIC statistics:
     rx_packets: 102700084
     tx_packets: 72630664
     rx_bytes: 136903466843
     tx_bytes: 44351377340
     rx_broadcast: 3743
     tx_broadcast: 93
     rx_multicast: 0
     tx_multicast: 6
     rx_errors: 0
     tx_errors: 0
     tx_dropped: 0
     multicast: 0
     collisions: 0
     rx_length_errors: 0
     rx_over_errors: 0
     rx_crc_errors: 0
     rx_frame_errors: 0
     rx_no_buffer_count: 3
     rx_missed_errors: 0
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_window_errors: 0
     tx_abort_late_coll: 0
     tx_deferred_ok: 2
     tx_single_coll_ok: 0
     tx_multi_coll_ok: 0
     tx_timeout_count: 0
     tx_restart_queue: 821779
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     rx_align_errors: 0
     tx_tcp_seg_good: 1573008
     tx_tcp_seg_failed: 0
     rx_flow_control_xon: 2
     rx_flow_control_xoff: 2
     tx_flow_control_xon: 71929941
     tx_flow_control_xoff: 71896901
     rx_long_byte_count: 136903466843
     rx_csum_offload_good: 102696253
     rx_csum_offload_errors: 0
     alloc_rx_buff_failed: 0
     tx_smbus: 0
     rx_smbus: 0
     dropped_smbus: 0

 Another pastebin link for the above
 http://pastebin.ca/1822475

 If you need anymore hardware information to update that script, let me know.

 --
 Download Intel#174; Parallel Studio Eval
 Try the new software tools for yourself. Speed compiling, find bugs
 proactively, and fine-tune applications for parallel performance.
 See why Intel Parallel Studio got high marks during beta.
 http://p.sf.net/sfu/intel-sw-dev
 ___
 E1000-devel mailing list
 E1000-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/e1000-devel
 To learn more about Intel#174; Ethernet, visit 
 http://communities.intel.com/community/wired


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] e1000_clean_tx_irq: Detected Tx Unit Hang

2010-03-03 Thread Jesse Brandeburg
On Mon, Mar 1, 2010 at 3:37 AM, Thrash Dude thrash.d...@gmail.com wrote:
 Seems to be a rather common issue with the e1000 module. I searched the
 archives back to 2005. Plenty of reports, no solutions.

There are some solutions, one of which is to try loading the driver
with TxDescriptorStep=4 TxDescriptors=1024


 The NIC does drop the link, PC does not hang. The link does become active
 again. Wouldn't be such an issue, although this PC is a file server for
 streaming audio and video files exported across nfs and cifs shares. Quite
 an annoying problem to get 55minutes into a movie to have the link die.

for some of the recent times have you been streaming using cifs or
NFS?  what version of NFS?  what client machine /os did you test with?
 What streaming software were you using to play the movie on the
remote machine?


 NOTE: No the link does not die with every movie. This seems to be
 completely random. I can flood the _server_ with 15 incoming connections
 continuously for 30 minutes and there's no problem. Or I can simply ping -
 c4 server and receive a Tx Unit Hang.

so maybe its not actually related to traffic levels?

 Machine specs -
 Slackware x86_64 -current
 Pure Virgin Kernel 2.6.32.8 (have noticed issue with previous kernels)
 7GB Ram
 AMD RS780

 Migrated same card to another machine to rule out +4GB question that is
 always. And another Chipset to test.
 Intel P45, 2GB Ram - same issue

This is actually a promising development because we might actually
have something close to that system here.  What slot did you plug in?
what is the barcode number on your adapter? XX-XXX.  The other
(bad) option is that since the problem follows the adapter it could be
the adapter.

have you double checked cooling of the NIC?  Do you have another
identical NIC you can try?  You can probably get warranty support for
the one you have, to get a replacement.

 VMware Player is currently installed. Issue presents itself when VMware is
 removed and/or VMware modules are not loaded.

 See below for modinfo, dmesg, IRQ's, lspci and some ethtool output

 Partial dmesg
 [43503.704198] e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
 [43503.704199]   Tx Queue             0 [43503.704200]   TDH
     c7 [43503.704201]   TDT                  da [43503.704201]
 next_to_use          da [43503.704202]   next_to_clean        c8
 [43503.704202] buffer_info[next_to_clean] [43503.704203]   time_stamp
     1029335c6 [43503.704203]   next_to_watch        c9 [43503.704204]
  jiffies              102933c78 [43503.704205]   next_to_watch.status
 0 [43505.704209] e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
 [43505.704211]   Tx Queue             0 [43505.704212]   TDH
     c7 [43505.704212]   TDT                  da [43505.704213]
 next_to_use          da [43505.704214]   next_to_clean        c8
 [43505.704214] buffer_info[next_to_clean] [43505.704215]   time_stamp
     1029335c6 [43505.704215]   next_to_watch        c9 [43505.704216]
  jiffies              102934448 [43505.704216]   next_to_watch.status
 0 [43507.704182] e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
 [43507.704183]   Tx Queue             0 [43507.704184]   TDH
     c7 [43507.704185]   TDT                  da [43507.704185]
 next_to_use          da [43507.704186]   next_to_clean        c8
 [43507.704186] buffer_info[next_to_clean] [43507.704187]   time_stamp
     1029335c6 [43507.704187]   next_to_watch        c9 [43507.704188]
  jiffies              102934c18 [43507.704189]   next_to_watch.status
 0

wow, thats a mess, please fix your mail client next time.  What I do
see in the above is is appears to be a legitimate tx hang.  We have
some debug code you can run that can help us diagnose, would you be
able to run that?


 modinfo e1000|grep ^version
 version:        7.3.21-k5-NAPI


 ethtool eth0
 Settings for eth0:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: umbg
        Wake-on: g
        Current message level: 0x0007 (7) Link detected: yes



 ethtool -i eth0
 driver: e1000
 version: 7.3.21-k5-NAPI
 firmware-version: N/A
 bus-info: :02:06.0



 ethtool -g eth0
 Ring parameters for eth0:
 Pre-set maximums:
 RX:             4096
 RX Mini:        0
 RX Jumbo:       0
 TX:             4096
 Current hardware settings:
 RX:             256
 RX Mini:        0
 RX Jumbo:       0
 TX:             256



 ethtool -k eth0
 Offload parameters for eth0:
 rx-checksumming: on

Re: [E1000-devel] New thread: page allocation failure with E1000 (seems to be reproducible)

2010-03-01 Thread Jesse Brandeburg
in the future please copy net...@vger.kernel.org on networking issues.

On Mon, Mar 1, 2010 at 9:34 AM, Richard Hartmann
richih.mailingl...@gmail.com wrote:
 Hi Jesse,

 the memory allocation (order:0), while unexpected, are not fatal, and
 the e1000 driver is written to handle the failures during allocation.

 Does something else happen to the system after this or does operation 
 continue?

 I can not be sure, but I _think_ some bogus data made it into userspace.
 I did have some binary in a text string I received  logged, which is a
 tad unusual.

hm, if that did occur it would be bad.  But it does sound like
operation continued, which is good.

 You might be able to try the sysctl tweak to reserve a little more
 memory for driver allocations.
 # sysctl vm.min_free_kbytes
 # sysctl -e vm.min_free_kbytes=double what you have

 I will try that, thanks.


 have you increased the number of rx/tx descriptors in use by e1000?

 No. Should I?

I wouldn't recommend it if you're already having issues getting
order:0 allocations, it would just make the problem worse.  I wanted
to make sure you were not.

Jesse

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] e1000e-1.1.2 Compile errors with 2.4.37 and gcc 2.95.3

2010-02-09 Thread Jesse Brandeburg
removed netdev,

On Tue, Feb 9, 2010 at 3:21 AM, Ben Hutchings bhutchi...@solarflare.com wrote:
 On Tue, 2010-02-09 at 10:58 +0100, Marco Schwarz wrote:
 Hi,

 I get the following output when trying to compile e1000e-1.1.2 with Linux 
 Kernel 2.4.37 and gcc 2.95.3 (e1000-8.0.16 compiles fine):
 [...]

 netdev only deals with recent 2.6 kernels.  I'm amazed that Intel still
 wastes time on 2.4.

this didn't make it to my intel address for some reason.

I'll figure out the build issues and we may re-release a driver with the fix.

--
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Ixgbe and VLAN filtering

2010-01-26 Thread Jesse Brandeburg
 Port Server Adapter, 82598 
 controller
 OpenSuse 11.2, 64 bit, kernel version: 2.6.31.5
 ixgbe-2.0.44.14-NAPI
 
 I have 4 ixgbe interfaces (eth1, eth2, eth3, eth4), and I 
 would like to bridge them.
 I would like to bridge only some specified VLANs (101 and 
 102).
 
 I have to cope with mass traffic, so effective VLAN filtering 
 is very important.
 I would like to use the 82598 controller's HW VLAN filtering.
 
 I use the following script:
 
 input_eths=eth1 eth2 eth3 eth4
 input_vlans=101 102
 
 echo 
 echo   Setting up input interfaces ...
 for eth in $input_eths
 do
   echo $eth
   ifconfig $eth 0.0.0.0 up
   for vlan in $input_vlans
   do
 vconfig add $eth $vlan
 ifconfig $eth.$vlan up
   done
 done
 
 echo 
 echo   Setting up bridge ...
 brctl addbr br0
 for eth in $input_eths
 do
 for vlan in $input_vlans
 do
   brctl addif br0 $eth.$vlan
 done
 done
 ifconfig br0 up
 
 My question is the following:
 If I use the vconfig utility to specify VLANs, does it result 
 HW vlan filtering in the 82598 Controller,
 or VLAN filtering is expressen only in the Linux (in the 
 ixgbe driver or in the Linux network stack)?
 
 Thanks,
 Gyorgy Szaniszlo
 Ericsson Hungary Ltd.
 
 
 
 
 
 Yes, when using vconfig, the ixgbe driver is given the vlan 
 information and sets the appropriate bits in the HW to do the filtering in 
 the hardware.
 
 sln
 
 
 ==
 Mr. Shannon Nelson LAN Access Division, Intel Corp.
 shannon.nel...@intel.commailto:shannon.nel...@intel.com 
I don't speak for Intel
 (503) 712-7659Parents can't afford to be 
 squeamish.
 
 
 
 
 
 
 ==
 Mr. Shannon Nelson LAN Access Division, Intel Corp.
 shannon.nel...@intel.commailto:shannon.nel...@intel.comI 
 don't speak for Intel
 (503) 712-7659Parents can't afford to be squeamish.
 
 
 
 

-- 
Jesse Brandeburg
This email sent via Evolution, powered by Linux


--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] [Bugme-new] [Bug 14748] New: e1000e NIC not working after reboot

2010-01-26 Thread Jesse Brandeburg
On Mon, Dec 7, 2009 at 2:01 PM, Brandeburg, Jesse
jesse.brandeb...@intel.com wrote:
 On Mon, 7 Dec 2009, Andrew Morton wrote:
  When I power up my system the NIC is working properly.
  After every reboot the NIC is not working. I mean the eth0 is created, but
  neither dhcpcd gets IP nor static setup helps

 We have a userspace tool called ethregs downloadable from
 http://downloads.sourceforge.net/project/e1000/Register%20Dump%20Tool/1.7.2/ethregs-1.7.2.tar.gz?use_mirror=iweb

 if it is not too much trouble can you build this tool and run it before
 (when the port is working) and after (when the link didn't come up)

 you can attach them to the bug, and reply to this thread would be best.

I've looked at the ethregs dumps, the good news is it looks like the
hardware succeeds to self-init, but on the ethregs-fails.txt did you
load the driver?  it appears you did not, or at least didn't do
# ip link set eth0 up
# ethregs  regs.txt

also looked at the lspci -vvv information and in both cases MSI was
enabled, but in the fails case the value in the data field for the MSI
vector is different, which seems a a little strange but I'm not sure
if it is responsible for failure

if the driver was loaded, and failed dhcp, what happens when you run
ethtool -t eth0 offline?

when the driver is loaded, and the dhcp fails, can you assign an
address manually (and bring the interface up) and have it work?

one more thing to note please, can you send cat /proc/interrupts from
10 seconds apart when the driver is loaded and the port is UP, but not
working.  dhcpcd or dhclient both have a tendency to put the port DOWN
after they fail to get address, so thats why you may need to do # ip
link command above before gathering /proc/interrupts.

is your bios up to date?

Thanks, sorry for the delay, lets see if we can figure out what is up.

Jesse

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel#174; Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82567V-3 PXE boot

2010-01-11 Thread Jesse Brandeburg
On Mon, 2010-01-11 at 01:12 -0800, kelon.hu...@emerson.com wrote:
 Hi, Support:
 
 These days I encounter an issue and urgently need your support. When I
 make a new Initrd.img of RHEL5.3 in order to boot from PXE, 82567V-3'
 driver cannot be found during the RHEL5.3 installation. I make sure that
 I have amended the files such as modules.alias, module-info and pci.ids.
 And I add the 82567V-3' driver--e1000e.ko to overwrite the old e1000e.ko
 in the file modules.cgz. Could you please give me some support? Is the
 82567V-3 not supported for PXE boot? Thanks!!
 
 BTW, e1000e.ko is made from e1000e-1.1.2.tar.

Did you force the module to be loaded with --preload in the initrd
creation?

in redhat the mkinitrd has a --preload=e1000e option to force a module
to be loaded out of the initrd.  Also may need --with=e1000e


-- 
Jesse Brandeburg
This email sent via Evolution, powered by Linux


--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel


Re: [E1000-devel] Excessive frame dropping on 82574L

2009-12-22 Thread Jesse Brandeburg
On Mon, 2009-12-21 at 18:53 -0700, Richard Scobie wrote:
 I have a low end server, Core 2 Duo 2.8, 4GB used to backup using rsync 
 over a 82574L interface. Kernel 2.6.30.9-102.fc11.x86_64 (e1000e 
 0.3.3.4-k4). It is using MSI-X interrupts.
 
 It's suffering somewhat due to dropping frames:
 
 RX packets:294914332 errors:0 dropped:95203 overruns:0 frame:0
 TX packets:355842341 errors:0 dropped:0 overruns:0 carrier:0
 
 and ethtool shows rx_missed_errors: 95203.
 
 Googling shows these are caused by the RX FIFO filling up.

Hi Richard, can you give the whole ethtool -S output?  depending on the
value of rx_no_buffer_count, you may be able to do something.

The other thing to send is the output of lspci -vvv for your system, I'm
curious if ASPM is enabled for the ethernet port or its upstream port.

The other thing we may be able to do is provide a patch to enable GRO if
at all possible (which should help significantly if it is not already
enabled,) you can check with ethtool -k ethX, but I guess it may already
be on.

Is flow control enabled to your switch?  Are you using jumbo frames?
There was a fifo (flow control) configuration issue in several versions
of the e1000e driver in the kernel.  If that was the case disabling flow
control might help you, ethtool -A ethX autoneg off rx off tx off

ethtool -G ethX rx 4096 will max out the number of rx descriptors.

you also may benefit from decreasing the interrupt rate using
ethtool -C ethX rx-usecs 125 (8000 interrupts per second) because you're
not doing a latency sensitive workload

Please also provide /proc/interrupts and ethtool -e ethX, and if you are
feeling gung-ho, the output of the ethregs utility available at
sourceforge (you'll have to build it) in the Register Dump utility
section.
-- 
Jesse Brandeburg
This email sent via Evolution, powered by Linux


--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel


Re: [E1000-devel] LRO botch with 82598EB 2.0.44.14-NAPI

2009-11-12 Thread Jesse Brandeburg
-eth2.44115: Flags [.], cksum 0x99b4 (correct), ack 
 39426, win 382, length 0
 17:21:51.463283 IP (tos 0x0, ttl 64, id 101, offset 0, flags [DF], proto TCP 
 (6), length 1500)
 nh2-eth2.44115  nh1-eth2.55716: Flags [.], ack 1, win 382, length 1460
 17:21:51.463288 IP (tos 0x0, ttl 64, id 56746, offset 0, flags [DF], proto 
 TCP (6), length 40)
 nh1-eth2.55716  nh2-eth2.44115: Flags [.], cksum 0x99b4 (correct), ack 
 39426, win 382, length 0
 17:21:52.484305 IP (tos 0x0, ttl 64, id 56747, offset 0, flags [DF], proto 
 TCP (6), length 1500)
 nh1-eth2.55716  nh2-eth2.44115: Flags [.], ack 39426, win 382, length 
 1460
 17:21:52.484327 IP (tos 0x0, ttl 64, id 102, offset 0, flags [DF], proto TCP 
 (6), length 40)
 nh2-eth2.44115  nh1-eth2.55716: Flags [.], cksum 0x99b6 (correct), ack 
 1, win 382, length 0
 17:21:52.484332 IP (tos 0x0, ttl 64, id 56748, offset 0, flags [DF], proto 
 TCP (6), length 40)
 nh1-eth2.55716  nh2-eth2.44115: Flags [.], cksum 0x99b4 (correct), ack 
 39426, win 382, length 0
-- 
Jesse Brandeburg
This email sent via Evolution, powered by Linux


--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel


Re: [E1000-devel] Intel e1000e: eth0: Detected Tx Unit Hang

2009-01-05 Thread Jesse Brandeburg
On Thu, Jan 1, 2009 at 11:56 AM, Soeren Sonnenburg ker...@nn7.de wrote:
 Dear list,

 I just recently observed a strange problem with an onboard 82567LF-2
 Intel ethernet controller. It completely stopped working and this
 desktop machine required a powerdown to get it to work again. This
 happened with 2.6.27.10. However, the machine was working for weeks
 before using an older kernel version (2.6.27.*, no binary modules, intel
 skyburg mainboard, e8400 c2d cpu).

 Relevant details follow, can someone make sense of this?

what does cat /proc/interrupts say?

please also include at least ethtool -e eth0 length 256, do you have
the IOMMU enabled?  your full dmesg would let me know.

also, please double check you're running the latest BIOS for your motherboard.

 00:19.0 Ethernet controller: Intel Corporation 82567LF-2 Gigabit Network 
 Connection

 $ dmesg | grep relevant parts

 Intel(R) PRO/1000 Network Driver - version 7.3.20-k3-NAPI
 e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k6
 :00:19.0: eth0: Intel(R) PRO/1000 Network Connection
 Intel(R) Gigabit Ethernet Network Driver - version 1.2.45-k2

 :00:19.0: eth0: (PCI Express:2.5GB/s:Width x1) XX:XX:XX:XX:XX:XX
 :00:19.0: eth0: Intel(R) PRO/1000 Network Connection
 :00:19.0: eth0: MAC: 5, PHY: 8, PBA No: ff-0ff
 ADDRCONF(NETDEV_UP): eth0: link is not ready
 :00:19.0: eth0: Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
 ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
 [...]
 [uptime of a couple of days, with lots of network i/o]
 [...]
 saa7146 (1) saa7146_i2c_writeout [irq]: timed out waiting for end of xfer
 saa7146 (1) saa7146_i2c_writeout [irq]: timed out waiting for end of xfer
 :00:19.0: eth0: Detected Tx Unit Hang:
  TDH  ff
  TDT  1
  next_to_use  1
  next_to_cleanff
 buffer_info[next_to_clean]:
  time_stamp   1104ec2c3
  next_to_watchff
  jiffies  1104ec8f0
  next_to_watch.status 0

snip lots of hangs...

what is the frequency of the hangs?  What kind of traffic are you using?

--
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel


Re: [E1000-devel] how to repair correupted EEPROM/NVM?

2008-08-29 Thread Jesse Brandeburg
On Fri, 2008-08-29 at 11:13 +0200, Pierre Ossman wrote:
  Brandeburg, Jesse [EMAIL PROTECTED] wrote:
  
   You have contacted your laptop vendor and told them about this right?
   
  
  I've tried, but they're currently giving me the whole have you checked
  the cable runaround. I'll see if I can get anyone on the phone today...
  
 
 No luck for a new EEPROM image. All they could do is replace the entire
 motherboard, which means they want to have access to the machine for two
 weeks. Not really a workable solution as I need it on a daily basis...

yeah, I certainly understand that.

 I'll try to get my paws on that other R61 and copy its EEPROM. Is there
 anything other than the EEPROM that are specific to each machine? The
 machines are identical hardware-wise. I need to know what to change in
 his image before I burn it into my device.

The only change should be the MAC address, I would only mess with the
first few bytes.

I'm concerned that you're actually having some other driver problem that
isn't getting reported, like a wierd semaphore issue or something. can
you build the e1000e-0.4.1.7.tar.gz driver from sourceforge and try it?
While you're doing that, please build (and install if you so choose)
with this


--- e1000_osdep.h~  2008-08-20 15:03:54.0 -0700
+++ e1000_osdep.h   2008-08-29 09:02:05.0 -0700
@@ -63,8 +63,8 @@
 #define ETH_ADDR_LEN   ETH_ALEN
 

-#define DEBUGOUT(S)
-#define DEBUGOUT1(S, A...)
+#define DEBUGOUT(S) printk(KERN_DEBUG S)
+#define DEBUGOUT1(S, A...) printk(KERN_DEBUG S, A)
 
 #define DEBUGFUNC(F) DEBUGOUT(F \n)
 #define DEBUGOUT2 DEBUGOUT1

when you rebuild and load the driver it will log a ton of stuff, please
send it in a reply.

The thing that most disturbed me is that you said your BIOS could still
read the MAC address (if it wasn't pulling it from somewhere else).
This indicates that it is likely your eeprom is still there and intact.

Jesse


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel


Re: [E1000-devel] Port showing link down at random times

2008-08-26 Thread Jesse Brandeburg
On Wed, 2008-08-27 at 09:26 +1000, Leigh Sharpe wrote:
 Hi All,
  I'm having a problem with my e1000 cards intermittently shutting down.
 This is happening across multiple cards, on multiple systems. I get the
 following messages in the syslog at the time:
  
 Aug 26 22:58:36 ElizaQOS kernel: e1000: eth13: e1000_watchdog: NIC Link
 is Down
 
 ethtool and mii-tool both show a different status for the affected port:
 ---
 [EMAIL PROTECTED]:~$ sudo mii-tool -v eth13
 eth13: negotiated 100baseTx-FD flow-control, link ok
   product info: vendor 00:50:43, model 2 rev 5
   basic mode:   autonegotiation enabled
   basic status: autonegotiation complete, link ok
   capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
   advertising:  100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
 flow-control
   link partner: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
 flow-control
 

what does ethtool -i eth13 say?

also, can you tell us if you see the Intel AMT bios pre-boot screen come
up?  we have heard lots of reports of people having interaction problems
with AMT, but believe the driver to have solved most of them now.

ethtool -i will say but you didn't report your driver version.  Would
you be willing to try the e1000e driver from sourceforge? version
0.4.1.7 would be the best.

you would have to manually remove e1000 and install e1000e in its place,
changing modprobe.conf is probably necessary.

Another option is to run a more recent kernel, but that is much harder
to get set up than just upgrading our driver.


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel