Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-05-18 Thread German Gomez
The problem is that I cannot fix it using setpci, as the Ethernet
controller does not support the ASPM capability, should I disable the
with pci_aspm=off ?

Regards,

- german

On Wed, 18 May 2011 11:55:01 +0100
Nix  wrote:

> On 17 May 2011, German Gomez stated:
> 
> > Sorry for replying to an old thread, but I'm getting exactly the
> > same problem with a 
> >
> > 00:19.0 Ethernet controller: Intel Corporation 82567LM Gigabit
> > Network Connection (rev 03)
> 
> FWIW I'm still getting it, and still 'fixing' it with a setpci after
> boot and line stabilization (if I try it before the line comes up ASPM
> turns itself straight on again). I just haven't had time to forward
> the problem to the PCI guys.
> 
> Sorry for my dilatoriness, but I thought I was alone. If I'm not,
> this adds a bit of impetus :)
> 


--
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its 
next-generation tools to help Windows* and Linux* C/C++ and Fortran 
developers boost performance applications - including clusters. 
http://p.sf.net/sfu/intel-dev2devmay
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-05-18 Thread Nix
On 17 May 2011, German Gomez stated:

> Sorry for replying to an old thread, but I'm getting exactly the same
> problem with a 
>
> 00:19.0 Ethernet controller: Intel Corporation 82567LM Gigabit Network 
> Connection (rev 03)

FWIW I'm still getting it, and still 'fixing' it with a setpci after
boot and line stabilization (if I try it before the line comes up ASPM
turns itself straight on again). I just haven't had time to forward the
problem to the PCI guys.

Sorry for my dilatoriness, but I thought I was alone. If I'm not,
this adds a bit of impetus :)

-- 
NULL && (void)

--
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its 
next-generation tools to help Windows* and Linux* C/C++ and Fortran 
developers boost performance applications - including clusters. 
http://p.sf.net/sfu/intel-dev2devmay
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-05-17 Thread German Gomez
Sorry for replying to an old thread, but I'm getting exactly the same
problem with a 

00:19.0 Ethernet controller: Intel Corporation 82567LM Gigabit Network 
Connection (rev 03)

in a Dell Precision M4400 notebook, as it was notice in this post the
network card is working perfectly with 2.6.35.13 but it is getting
freeze with 2.6.37.6. It happens almost daily, I need to reboot the
computer as the device fails, although ifconfig show that I have an
eth0 preset, "ethtool eth0" gives no such device. The main problem is
that lspci -vvv -s 00:19.0

sudo lspci -vvv -s 00:19.0
00:19.0 Ethernet controller: Intel Corporation 82567LM Gigabit Network 
Connection (rev 03)
Subsystem: Dell Device 0250
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- 00:00.0 Host bridge: Intel Corporation Mobile 4 Series Chipset Memory 
Controller Hub (rev 07)
Subsystem: Dell Device 0250
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- 
Kernel modules: intel-agp

00:01.0 PCI bridge: Intel Corporation Mobile 4 Series Chipset PCI Express 
Graphics Port (rev 07) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [88] Subsystem: Dell Device 0250
Capabilities: [80] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
Address:   Data: 
Capabilities: [a0] Express (v1) Root Port (Slot+), MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, 
L1 <1us
ExtTag- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- 
Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr- 
TransPend-
LnkCap: Port #2, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency 
L0 <256ns, L1 <4us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- 
CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ 
DLActive- BWMgmt- ABWMgmt-
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ 
Surprise-
Slot #1, PowerLimit 75.000W; Interlock- NoCompl+
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- 
LinkChg-
Control: AttnInd Off, PwrInd On, Power- Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ 
Interlock-
Changed: MRL- PresDet+ LinkState-
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- 
CRSVisible-
RootCap: CRSVisible-
RootSta: PME ReqID , PMEStatus- PMEPending-
Capabilities: [100 v1] Virtual Channel
Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
Arb:Fixed+ WRR32- WRR64- WRR128-
Ctrl:   ArbSelect=Fixed
Status: InProgress-
VC0:Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb:Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
Status: NegoPending- InProgress-
Capabilities: [140 v1] Root Complex Link
Desc:   PortNumber=02 ComponentID=01 EltType=Config
Link0:  Desc:   TargetPort=00 TargetComponent=01 AssocRCRB- 
LinkType=MemMapped LinkValid+
Addr:   feda5000
Kernel modules: shpchp

00:19.0 Ethernet controller: Intel Corporation 82567LM Gigabit Network 
Connection (rev 03)
Subsystem: Dell Device 0250
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- 
SERR- TAbort- 
SERR- TAbort- 
SERR- TAbort- 
SERR- TAbort- SERR- TAbort- SERR- TAbort- Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Express (v1) Root Port (Slot+), MSI 0

Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-02-09 Thread Allan, Bruce W
OK, that's great news!  I'll check into why ASPM is not getting set
correctly by the kernel.

>-Original Message-
>From: Nix [mailto:n...@esperi.org.uk]
>Sent: Wednesday, February 09, 2011 12:24 PM
>To: Allan, Bruce W
>Cc: Jesse Brandeburg; e1000-devel@lists.sourceforge.net
>Subject: Re: [E1000-devel] 82754L spontaneous freeze networking woes continue 
>in
>2.6.37
>
>On 2 Feb 2011, n...@esperi.org.uk uttered the following:
>
>> On 1 Feb 2011, Bruce W. Allan stated:
>>
>>>>From: Jesse Brandeburg [mailto:jesse.brandeb...@gmail.com]
>>>>Please, for our benefit, file a bug at e1000.sf.net (if you have not
>>>>already) so you can attach the .config and full dmesg file from a
>>>>non-working kernel, also please attach the full lspci -vvv output.
>>
>> Done! Bug 3170405. There's lspci output from a working kernel too.
>
>Update: After a week of testing, it does indeed appear that
>
>setpci -s 02:00.0 CAP_EXP+10.b=40
>
>(where 02:00.0 is the PCI address of the 82754L which is running in
>gigabit mode) fixes it in 2.6.36+: I have had no recurrences of sudden
>NIC 0xff register death, and without this they happened on an almost
>daily and sometimes hourly basis. Of course the question is why the
>code to flip this automatically, isn't...

--
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-02-09 Thread Nix
On 2 Feb 2011, n...@esperi.org.uk uttered the following:

> On 1 Feb 2011, Bruce W. Allan stated:
>
>>>From: Jesse Brandeburg [mailto:jesse.brandeb...@gmail.com]
>>>Please, for our benefit, file a bug at e1000.sf.net (if you have not
>>>already) so you can attach the .config and full dmesg file from a
>>>non-working kernel, also please attach the full lspci -vvv output.
>
> Done! Bug 3170405. There's lspci output from a working kernel too.

Update: After a week of testing, it does indeed appear that

setpci -s 02:00.0 CAP_EXP+10.b=40

(where 02:00.0 is the PCI address of the 82754L which is running in
gigabit mode) fixes it in 2.6.36+: I have had no recurrences of sudden
NIC 0xff register death, and without this they happened on an almost
daily and sometimes hourly basis. Of course the question is why the
code to flip this automatically, isn't...

--
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-02-02 Thread Nix
On 1 Feb 2011, Bruce W. Allan stated:

>>From: Jesse Brandeburg [mailto:jesse.brandeb...@gmail.com]
>>Please, for our benefit, file a bug at e1000.sf.net (if you have not
>>already) so you can attach the .config and full dmesg file from a
>>non-working kernel, also please attach the full lspci -vvv output.

Done! Bug 3170405. There's lspci output from a working kernel too.

> Actually, when CONFIG_ASPM=n the e1000e driver will forcibly disable ASPM
> by directly writing to the PCI config space instead of using the kernel
> API.

... but the PCI configuration space shows that even in 2.6.35.x (which did
not have ASPM kernel support as I recall, and which works for me perfectly
well with no freezing NICs), L0s and L1 are active. This is getting more
confusing by the moment.

>   So, I doubt that is how CONFIG_ASPM is set.  But I agree, a bug filed
> on SourceForge should be done for better tracking of this issue and an
> owner can be properly assigned.

Sorry for not doing it initially, but I have an allergy to SourceForge's
impossibly slow and clumsy user interface :) it seems to have improved
a bit recently though.

--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-02-02 Thread Nix
On 1 Feb 2011, Bruce W. Allan spake thusly:

>>-Original Message-
>>From: Nix [mailto:n...@esperi.org.uk]
>>I am... confuzzled, but am happy to try turning L0s/L1 off (if I can
>>figure out how to do it: setpci is... not the most friendly of tools
>>and I've never even looked at its manpage before).
>
> ASPM is enabled/disabled via bits 1:0 of byte 16 in the Express Endpoint
> capability register.  First see what is in this byte with the following:
>
> # setpci -s ]:]]:][][.[]] CAP_EXP+10.b
>
> where ]:]]:][][.[]] is the slot information
> for your 82574.  I'm guessing that command will return 43 (hex) to indicate
> ASPM L0s (bit 0) and ASPM L1 (bit 1) are both enabled based on your previous

Quite so.

More confusingly, they are both enabled on every kernel I have access
to, from 2.6.37 right back to 2.6.35.4 (which does not freeze with
in-tree nor out-of-tree drivers). Possibly something else is keeping it
active enough to stay awake in that kernel?

> lspci output.  Now, re-write the byte with bits 1:0 set to 10b (or 42 hex)
> to disable ASPM L0s:
>
> # setpci -s ]:]]:][][.[]] CAP_EXP+10.b=42
>
> or 00b (40 hex) to disable both ASPM L0s and L1:
>
> # setpci -s ]:]]:][][.[]] CAP_EXP+10.b=40
>
> and verify with 'lspci -vvv' that ASPM L0s [and L1] are disabled.

LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-

So, lspci is not lying to us. (It looks to me like I can still get the
benefits of ASPM on the NIC that is only running at 100Mb/s: it's never
once frozen. So for now I've left it on on that NIC, and turned it off
on the gigabit one.)


Let us see if the NIC freezes in the next week or so.

--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-02-01 Thread Allan, Bruce W
>-Original Message-
>From: Jesse Brandeburg [mailto:jesse.brandeb...@gmail.com]
>Sent: Monday, January 31, 2011 11:48 PM
>To: Nix
>Cc: Allan, Bruce W; e1000-devel@lists.sourceforge.net
>Subject: Re: [E1000-devel] 82754L spontaneous freeze networking woes continue 
>in
>2.6.37
>
>Please, for our benefit, file a bug at e1000.sf.net (if you have not
>already) so you can attach the .config and full dmesg file from a
>non-working kernel, also please attach the full lspci -vvv output.
>
>The reason I'm asking for this is that the kernel may actually be
>configured to not do aspm at all (CONFIG_ASPM=n), but it still is
>"helpful" by printing strings like it did something[1]
>
>[1] http://lxr.linux.no/linux+v2.6.37/include/linux/pci-aspm.h#L41

Actually, when CONFIG_ASPM=n the e1000e driver will forcibly disable ASPM
by directly writing to the PCI config space instead of using the kernel
API.  So, I doubt that is how CONFIG_ASPM is set.  But I agree, a bug filed
on SourceForge should be done for better tracking of this issue and an
owner can be properly assigned.


--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-01-31 Thread Jesse Brandeburg
On 1/31/2011 4:06 PM, Allan, Bruce W wrote:
>> -Original Message-
>> From: Nix [mailto:n...@esperi.org.uk]
>> Sent: Monday, January 31, 2011 3:31 PM
>> To: Allan, Bruce W
>> Cc: e1000-devel@lists.sourceforge.net
>> Subject: Re: [E1000-devel] 82754L spontaneous freeze networking woes 
>> continue in
>> 2.6.37
>>
>> On 31 Jan 2011, Bruce W. Allan spake thusly:
>>
>>>> From: Nix [mailto:n...@esperi.org.uk]
>>>> I'm not so sure anymore. In 2.6.35.4, everything works -- but in 2.6.35.4,
>>>> the lspci output is *exactly the same*, i.e. even there lspci claims that
>>>> ASPM L0s and L1 are enabled. This seems unlikely, since even if the L0s/L1
>>>> state persists across a poweroff, the problem disappears upon a simple
>>>> reboot into 2.6.35.4, and does not recur in that kernel release.
>>>
>>> Which kernel versions?  The above mentioned are all the same???
>>
>> Yes. 2.6.35.4..2.6.37 have no differences whatsoever in their lspci output
>> for my 82574L cards.
>>
>> I am... confuzzled, but am happy to try turning L0s/L1 off (if I can
>> figure out how to do it: setpci is... not the most friendly of tools
>> and I've never even looked at its manpage before).
> 
> ASPM is enabled/disabled via bits 1:0 of byte 16 in the Express Endpoint
> capability register.  First see what is in this byte with the following:
> 
> # setpci -s ]:]]:][][.[]] CAP_EXP+10.b
> 
> where ]:]]:][][.[]] is the slot information
> for your 82574.  I'm guessing that command will return 43 (hex) to indicate
> ASPM L0s (bit 0) and ASPM L1 (bit 1) are both enabled based on your previous
> lspci output.  Now, re-write the byte with bits 1:0 set to 10b (or 42 hex)
> to disable ASPM L0s:
> 
> # setpci -s ]:]]:][][.[]] CAP_EXP+10.b=42
> 
> or 00b (40 hex) to disable both ASPM L0s and L1:
> 
> # setpci -s ]:]]:][][.[]] CAP_EXP+10.b=40
> 
> and verify with 'lspci -vvv' that ASPM L0s [and L1] are disabled.

Please, for our benefit, file a bug at e1000.sf.net (if you have not
already) so you can attach the .config and full dmesg file from a
non-working kernel, also please attach the full lspci -vvv output.

The reason I'm asking for this is that the kernel may actually be
configured to not do aspm at all (CONFIG_ASPM=n), but it still is
"helpful" by printing strings like it did something[1]

[1] http://lxr.linux.no/linux+v2.6.37/include/linux/pci-aspm.h#L41

--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-01-31 Thread Allan, Bruce W
>-Original Message-
>From: Nix [mailto:n...@esperi.org.uk]
>Sent: Monday, January 31, 2011 3:31 PM
>To: Allan, Bruce W
>Cc: e1000-devel@lists.sourceforge.net
>Subject: Re: [E1000-devel] 82754L spontaneous freeze networking woes continue 
>in
>2.6.37
>
>On 31 Jan 2011, Bruce W. Allan spake thusly:
>
>>>From: Nix [mailto:n...@esperi.org.uk]
>>>I'm not so sure anymore. In 2.6.35.4, everything works -- but in 2.6.35.4,
>>>the lspci output is *exactly the same*, i.e. even there lspci claims that
>>>ASPM L0s and L1 are enabled. This seems unlikely, since even if the L0s/L1
>>>state persists across a poweroff, the problem disappears upon a simple
>>>reboot into 2.6.35.4, and does not recur in that kernel release.
>>
>> Which kernel versions?  The above mentioned are all the same???
>
>Yes. 2.6.35.4..2.6.37 have no differences whatsoever in their lspci output
>for my 82574L cards.
>
>I am... confuzzled, but am happy to try turning L0s/L1 off (if I can
>figure out how to do it: setpci is... not the most friendly of tools
>and I've never even looked at its manpage before).

ASPM is enabled/disabled via bits 1:0 of byte 16 in the Express Endpoint
capability register.  First see what is in this byte with the following:

# setpci -s ]:]]:][][.[]] CAP_EXP+10.b

where ]:]]:][][.[]] is the slot information
for your 82574.  I'm guessing that command will return 43 (hex) to indicate
ASPM L0s (bit 0) and ASPM L1 (bit 1) are both enabled based on your previous
lspci output.  Now, re-write the byte with bits 1:0 set to 10b (or 42 hex)
to disable ASPM L0s:

# setpci -s ]:]]:][][.[]] CAP_EXP+10.b=42

or 00b (40 hex) to disable both ASPM L0s and L1:

# setpci -s ]:]]:][][.[]] CAP_EXP+10.b=40

and verify with 'lspci -vvv' that ASPM L0s [and L1] are disabled.


--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-01-31 Thread Nix
On 31 Jan 2011, Bruce W. Allan spake thusly:

>>From: Nix [mailto:n...@esperi.org.uk]
>>I'm not so sure anymore. In 2.6.35.4, everything works -- but in 2.6.35.4,
>>the lspci output is *exactly the same*, i.e. even there lspci claims that
>>ASPM L0s and L1 are enabled. This seems unlikely, since even if the L0s/L1
>>state persists across a poweroff, the problem disappears upon a simple
>>reboot into 2.6.35.4, and does not recur in that kernel release.
>
> Which kernel versions?  The above mentioned are all the same???

Yes. 2.6.35.4..2.6.37 have no differences whatsoever in their lspci output
for my 82574L cards.

I am... confuzzled, but am happy to try turning L0s/L1 off (if I can
figure out how to do it: setpci is... not the most friendly of tools
and I've never even looked at its manpage before).

--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-01-31 Thread Allan, Bruce W
>-Original Message-
>From: Nix [mailto:n...@esperi.org.uk]
>Sent: Monday, January 31, 2011 3:24 PM
>To: Allan, Bruce W
>Cc: e1000-devel@lists.sourceforge.net
>Subject: Re: [E1000-devel] 82754L spontaneous freeze networking woes continue 
>in
>2.6.37
>
>On 31 Jan 2011, Bruce W. Allan spake thusly:
>
>>>Because lspci simply reads the PCI configuration space (IIRC), I doubt it
>>>is reporting incorrect information.  The e1000e driver uses the kernel
>>>API to disable ASPM (when CONFIG_PCIEASPM is enabled in the kernel config
>>>otherwise it writes directly to the PCI configuration space to disable
>>>ASPM).  Assuming your kernel config has CONFIG_PCIEASPM enabled, my guess
>>>at this point would be there is something broken in the kernel.  With ASPM
>>>L0s enabled, the 82574 (and other parts supported by the driver) will most
>>>definitely have issues, so we need to find out what is broke and fix it.
>>
>> Since it does appear to be a problem with the kernel, a brute force method
>> to work around the issue is to manually disable ASPM (I suggest first try
>> disabling only ASPM L0s) using setpci.  If disabling ASPM L0s is not enough
>> then disable ASPM L1 in both the 82574 and upstream PCI bridge.
>
>I'm not so sure anymore. In 2.6.35.4, everything works -- but in 2.6.35.4,
>the lspci output is *exactly the same*, i.e. even there lspci claims that
>ASPM L0s and L1 are enabled. This seems unlikely, since even if the L0s/L1
>state persists across a poweroff, the problem disappears upon a simple
>reboot into 2.6.35.4, and does not recur in that kernel release.

Which kernel versions?  The above mentioned are all the same???

--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-01-31 Thread Nix
On 31 Jan 2011, Bruce W. Allan spake thusly:

>>Because lspci simply reads the PCI configuration space (IIRC), I doubt it
>>is reporting incorrect information.  The e1000e driver uses the kernel
>>API to disable ASPM (when CONFIG_PCIEASPM is enabled in the kernel config
>>otherwise it writes directly to the PCI configuration space to disable
>>ASPM).  Assuming your kernel config has CONFIG_PCIEASPM enabled, my guess
>>at this point would be there is something broken in the kernel.  With ASPM
>>L0s enabled, the 82574 (and other parts supported by the driver) will most
>>definitely have issues, so we need to find out what is broke and fix it.
>
> Since it does appear to be a problem with the kernel, a brute force method
> to work around the issue is to manually disable ASPM (I suggest first try
> disabling only ASPM L0s) using setpci.  If disabling ASPM L0s is not enough
> then disable ASPM L1 in both the 82574 and upstream PCI bridge.

I'm not so sure anymore. In 2.6.35.4, everything works -- but in 2.6.35.4,
the lspci output is *exactly the same*, i.e. even there lspci claims that
ASPM L0s and L1 are enabled. This seems unlikely, since even if the L0s/L1
state persists across a poweroff, the problem disappears upon a simple
reboot into 2.6.35.4, and does not recur in that kernel release.

--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-01-31 Thread Allan, Bruce W
>-Original Message-
>From: Allan, Bruce W [mailto:bruce.w.al...@intel.com]
>Sent: Monday, January 31, 2011 3:06 PM
>To: Nix
>Cc: e1000-devel@lists.sourceforge.net
>Subject: Re: [E1000-devel] 82754L spontaneous freeze networking woes continue 
>in
>2.6.37
>
>Because lspci simply reads the PCI configuration space (IIRC), I doubt it
>is reporting incorrect information.  The e1000e driver uses the kernel
>API to disable ASPM (when CONFIG_PCIEASPM is enabled in the kernel config
>otherwise it writes directly to the PCI configuration space to disable
>ASPM).  Assuming your kernel config has CONFIG_PCIEASPM enabled, my guess
>at this point would be there is something broken in the kernel.  With ASPM
>L0s enabled, the 82574 (and other parts supported by the driver) will most
>definitely have issues, so we need to find out what is broke and fix it.

Since it does appear to be a problem with the kernel, a brute force method
to work around the issue is to manually disable ASPM (I suggest first try
disabling only ASPM L0s) using setpci.  If disabling ASPM L0s is not enough
then disable ASPM L1 in both the 82574 and upstream PCI bridge.

--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-01-31 Thread Allan, Bruce W
>-Original Message-
>From: Nix [mailto:n...@esperi.org.uk]
>Sent: Monday, January 31, 2011 2:43 PM
>To: Allan, Bruce W
>Cc: e1000-devel@lists.sourceforge.net
>Subject: Re: [E1000-devel] 82754L spontaneous freeze networking woes continue 
>in
>2.6.37
>
>On 31 Jan 2011, n...@esperi.org.uk stated:
>
>> On 31 Jan 2011, Bruce W. Allan said:
>>> Have you tried booting with pcie_aspm=off kernel parameter?
>>
>> I didn't know that parameter existe. Added, will reboot shortly: let us
>> see what happens. :)
>
>No change:
>
>LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
>ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>
>LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
>ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>
>Boot messages include:
>
>[0.00] PCIe ASPM is disabled
>[2.132444] e1000e :03:00.0: Disabling ASPM L0s
>[2.293944] e1000e :02:00.0: Disabling ASPM L0s
>[8.489378] e1000e :02:00.0: Disabling ASPM  L1
>
>(the latter is on the gigabit link).
>
>Either lspci is lying to me, or the kernel's attempts to disable ASPM
>are doing nothing at all.
>
>I will find out soon enough which is true, as I'm no longer doing the
>continuous pingflood, so if ASPM is on (or the problem is somewhere
>else), the card will hang again...

Because lspci simply reads the PCI configuration space (IIRC), I doubt it
is reporting incorrect information.  The e1000e driver uses the kernel
API to disable ASPM (when CONFIG_PCIEASPM is enabled in the kernel config
otherwise it writes directly to the PCI configuration space to disable
ASPM).  Assuming your kernel config has CONFIG_PCIEASPM enabled, my guess
at this point would be there is something broken in the kernel.  With ASPM
L0s enabled, the 82574 (and other parts supported by the driver) will most
definitely have issues, so we need to find out what is broke and fix it.

--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-01-31 Thread Nix
On 31 Jan 2011, n...@esperi.org.uk stated:

> On 31 Jan 2011, Bruce W. Allan said:
>> Have you tried booting with pcie_aspm=off kernel parameter?
>
> I didn't know that parameter existe. Added, will reboot shortly: let us
> see what happens. :)

No change:

LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-

LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-

Boot messages include:

[0.00] PCIe ASPM is disabled
[2.132444] e1000e :03:00.0: Disabling ASPM L0s
[2.293944] e1000e :02:00.0: Disabling ASPM L0s
[8.489378] e1000e :02:00.0: Disabling ASPM  L1

(the latter is on the gigabit link).

Either lspci is lying to me, or the kernel's attempts to disable ASPM
are doing nothing at all.

I will find out soon enough which is true, as I'm no longer doing the
continuous pingflood, so if ASPM is on (or the problem is somewhere
else), the card will hang again...

--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-01-31 Thread Nix
On 31 Jan 2011, Bruce W. Allan said:

>>From: Nix [mailto:n...@esperi.org.uk]
>>I wonder if this has something to do with PCI ASPM? The driver turns
>>ASPM off at least partially for this NIC, but if the NIC is being
>>flipped into some sort of low-power state when transmission ceases for a
>>while, then perhaps there is a low probability of it not coming out of
>>it again properly. That would explain the symptoms I see (but so would
>>many other things, I suppoe).
>
> It sounds like a kernel issue based on your description, and I would not be
> surprised if this turns out to be related to ASPM L1.  Have you verified
> whether or not ASPM L0s is actually turned off on the 82574 by checking
> the LnkCtl capability register in the output of 'lspci -vvv -d 8086:'

I think you've got it!

LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-

LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-

That doesn't look very disabled, does it? (I'm feeling like a right idiot
for not noticing this before now.)

I think we can say that this, in the boot messages:

[2.101382] e1000e :03:00.0: Disabling ASPM L0s
[2.263219] e1000e :02:00.0: Disabling ASPM L0s

is not taking effect, or if it is, is not sticking for very long.

> Have you tried booting with pcie_aspm=off kernel parameter?

I didn't know that parameter existe. Added, will reboot shortly: let us
see what happens. :)

(I don't think anything else on my system is actually using ASPM yet:
it's a shame it doesn't work on this part, as one NIC spends a lot of
its time looking at a subnet with nothing but suspended machines on it,
and this is an always-on box, so a bit of power-saving there would have
been nice. Still, if the choice is between drawing power and not working,
I'd rather draw power. There aren't any firmware updates that fix this,
by any chance?)

--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-01-31 Thread Allan, Bruce W
>-Original Message-
>From: Nix [mailto:n...@esperi.org.uk]
>Sent: Saturday, January 29, 2011 3:44 PM
>To: e1000-devel@lists.sourceforge.net
>Subject: [E1000-devel] 82754L spontaneous freeze networking woes continue in
>2.6.37
>
>Way back in November, in
><http://sourceforge.net/mailarchive/forum.php?thread_name=87k4kfq1at.fsf%40spind
>le.srvr.nix&forum_name=e1000-devel>,
>I reported a problem with the 82754 in one of my machines freezing up at
>random. This problem continues in 2.6.37, and bisection has still failed
>because the fault is so intermittent (averaging three days apart and
>sometimes taking as long as a week to freeze up, with many registers suddenly
>reset to 0xff: but sometimes it freezes in only half an hour).
>
>I moaned about it in an LWN thread as well: <http://lwn.net/Articles/416758/>
>and hmh suggested I come here, but I decided to hold off until I knew a
>bit more. Since then, I've been able to characterize it a bit. (All the
>conclusions below are tentative: perhaps I was just lucky in some cases
>and the fault happened not to kick in before I tried something else.)
>
>It happens with both the in-kernel and out-of-tree drivers in 2.6.36 and
>above, but does not affect 2.6.35 with either driver. It is *not*
>suppressed by turning off MSI-X, nor by turning off jumbo frames (both
>of which are working in 2.6.35 anyway). It is apparently suppressed by
>switching it out of gigabit mode, by turning off every machine attached
>to the subnet on which it is transmitting (though this may simply be an
>artefact caused by its not needing to send anything down the link when
>that is done), and, oddly, by pingflooding the machine (with the packets
>entering via the NIC that fails). (I've been pingflooding it for three
>weeks now, and no halts have happened. I stopped for three hours and the
>NIC locked up.)
>
>I wonder if this has something to do with PCI ASPM? The driver turns
>ASPM off at least partially for this NIC, but if the NIC is being
>flipped into some sort of low-power state when transmission ceases for a
>while, then perhaps there is a low probability of it not coming out of
>it again properly. That would explain the symptoms I see (but so would
>many other things, I suppoe).

It sounds like a kernel issue based on your description, and I would not be
surprised if this turns out to be related to ASPM L1.  Have you verified
whether or not ASPM L0s is actually turned off on the 82574 by checking
the LnkCtl capability register in the output of 'lspci -vvv -d 8086:'
(where  is either 10d3 or 10f6 depending on which 82574 you have)? 
Have you tried booting with pcie_aspm=off kernel parameter?

Bruce.

--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


[E1000-devel] 82754L spontaneous freeze networking woes continue in 2.6.37

2011-01-29 Thread Nix
Way back in November, in 
,
I reported a problem with the 82754 in one of my machines freezing up at
random. This problem continues in 2.6.37, and bisection has still failed
because the fault is so intermittent (averaging three days apart and
sometimes taking as long as a week to freeze up, with many registers suddenly
reset to 0xff: but sometimes it freezes in only half an hour).

I moaned about it in an LWN thread as well: 
and hmh suggested I come here, but I decided to hold off until I knew a
bit more. Since then, I've been able to characterize it a bit. (All the
conclusions below are tentative: perhaps I was just lucky in some cases
and the fault happened not to kick in before I tried something else.)

It happens with both the in-kernel and out-of-tree drivers in 2.6.36 and
above, but does not affect 2.6.35 with either driver. It is *not*
suppressed by turning off MSI-X, nor by turning off jumbo frames (both
of which are working in 2.6.35 anyway). It is apparently suppressed by
switching it out of gigabit mode, by turning off every machine attached
to the subnet on which it is transmitting (though this may simply be an
artefact caused by its not needing to send anything down the link when
that is done), and, oddly, by pingflooding the machine (with the packets
entering via the NIC that fails). (I've been pingflooding it for three
weeks now, and no halts have happened. I stopped for three hours and the
NIC locked up.)

I wonder if this has something to do with PCI ASPM? The driver turns
ASPM off at least partially for this NIC, but if the NIC is being
flipped into some sort of low-power state when transmission ceases for a
while, then perhaps there is a low probability of it not coming out of
it again properly. That would explain the symptoms I see (but so would
many other things, I suppoe).

--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired