Re: 10.1-STABLE bce: Watchdog timeout occurred

2015-04-21 Thread Alnis Morics

On 04/21/2015 06:17 AM, Chris Ross wrote:

   I got a new [to me] system recently, a Dell PE 1950.  It has two bce parts 
on the motherboard that identify as:

bce#: QLogic NetXtreme II BCM5708 1000Base-T (B2)

   The OS I installed and kernel I'm running are from a download of a 10.1 
STABLE ISO, r281235, April 7, 2015.

   I had gone on to check out a newer stable from subversion, and build a 
custom kernel, but when I booted that one I got a bce0 that didn't seem to 
work, and kept emitting:

bce0: /usr/src/sys/dev/bce/if_bce.c(7869): Watchdog timeout occurred, resetting!
bce0: link state changed to DOWN
bce0: link state changed to UP

   So, I fell back.  But I've since noticed that even the original kernel seems 
to do this after booting.  I'm not yet running any notable amount of traffic 
through the system, but intend to make it an edge router, so certainly will be.

   Is there any sort of issue noted in the bce driver in recent 
days/weeks/months?  Are other folks seeing this diagnostic/error?

   I'll do a little more testing and see if I'm seeing it more or less often, 
but I know that in at least some cases the interface has flapped like this 
after boot for long enough that I was unable to get connected remotely, and 
resorted to a console login to reboot.

 - Chris

There are Watchdog timeout errors with some msk NICs. Both msk and bce 
are dependent on MII bus code (see /usr/src/sys/amd64/conf/GENERIC)


-Alnis
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 10.1-STABLE bce: Watchdog timeout occurred

2015-04-21 Thread Gareth Wyn Roberts

This may be caused by DMA alignment problems.
See 
https://docs.freebsd.org/cgi/getmsg.cgi?fetch=145859+0+archive/2015/freebsd-stable/20150419.freebsd-stable 
for a recent thread about the msk driver.  The msk maintainer Yonghyeon 
Pyun has opted for super safe options of 32K alignment!


It's a long shot, but you could try increasing BCE_DMA_ALIGN and/or 
BCE_RX_BUF_ALIGN in the include file if_bcereg.h, say up to 4096, to see 
whether it makes any difference.


- Gareth.

On 21/04/2015 10:52, Alnis Morics wrote:

On 04/21/2015 06:17 AM, Chris Ross wrote:
   I got a new [to me] system recently, a Dell PE 1950.  It has two 
bce parts on the motherboard that identify as:


bce#: QLogic NetXtreme II BCM5708 1000Base-T (B2)

   The OS I installed and kernel I'm running are from a download of a 
10.1 STABLE ISO, r281235, April 7, 2015.


   I had gone on to check out a newer stable from subversion, and 
build a custom kernel, but when I booted that one I got a bce0 that 
didn't seem to work, and kept emitting:


bce0: /usr/src/sys/dev/bce/if_bce.c(7869): Watchdog timeout occurred, 
resetting!

bce0: link state changed to DOWN
bce0: link state changed to UP

   So, I fell back.  But I've since noticed that even the original 
kernel seems to do this after booting.  I'm not yet running any 
notable amount of traffic through the system, but intend to make it 
an edge router, so certainly will be.


   Is there any sort of issue noted in the bce driver in recent 
days/weeks/months?  Are other folks seeing this diagnostic/error?


   I'll do a little more testing and see if I'm seeing it more or 
less often, but I know that in at least some cases the interface has 
flapped like this after boot for long enough that I was unable to get 
connected remotely, and resorted to a console login to reboot.


 - Chris

There are Watchdog timeout errors with some msk NICs. Both msk and 
bce are dependent on MII bus code (see /usr/src/sys/amd64/conf/GENERIC)


-Alnis
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 10.1-STABLE bce: Watchdog timeout occurred

2015-04-21 Thread Chris Ross

On Apr 21, 2015, at 10:10 , Gareth Wyn Roberts g.w.robe...@glyndwr.ac.uk 
wrote:
 This may be caused by DMA alignment problems.
 See 
 https://docs.freebsd.org/cgi/getmsg.cgi?fetch=145859+0+archive/2015/freebsd-stable/20150419.freebsd-stable
  for a recent thread about the msk driver.  The msk maintainer Yonghyeon Pyun 
 has opted for super safe options of 32K alignment!
 
 It's a long shot, but you could try increasing BCE_DMA_ALIGN and/or 
 BCE_RX_BUF_ALIGN in the include file if_bcereg.h, say up to 4096, to see 
 whether it makes any difference.

  Well, after making that change, I was able to confirm that the problem 
doesn't seem to occur.  However, in trying to verify the problem on an 
unmodified kernel, I've rebooted a GENERIC from r281672 without that change, 
and am also not seeing the problem.  :-/  I'm not sure whether the gremlins 
have fixed something, or if I was just too critical in my initial analysis.

  For now I'll take that change out of my tree and run without it.  If I see 
the flapping again, I'll confirm that it's repeatable, then change the 
alignments as suggested and see if I see a change.

  Thanks all...

 - Chris



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: 10.1-STABLE bce: Watchdog timeout occurred

2015-04-21 Thread Yonghyeon PYUN
On Wed, Apr 22, 2015 at 12:39:16AM -0400, Chris Ross wrote:
 
 On Apr 21, 2015, at 10:10 , Gareth Wyn Roberts g.w.robe...@glyndwr.ac.uk 
 wrote:
  This may be caused by DMA alignment problems.
  See 
  https://docs.freebsd.org/cgi/getmsg.cgi?fetch=145859+0+archive/2015/freebsd-stable/20150419.freebsd-stable
   for a recent thread about the msk driver.  The msk maintainer Yonghyeon 
  Pyun has opted for super safe options of 32K alignment!
  
  It's a long shot, but you could try increasing BCE_DMA_ALIGN and/or 
  BCE_RX_BUF_ALIGN in the include file if_bcereg.h, say up to 4096, to see 
  whether it makes any difference.
 
   Well, after making that change, I was able to confirm that the problem 
 doesn't seem to occur.  However, in trying to verify the problem on an 
 unmodified kernel, I've rebooted a GENERIC from r281672 without that change, 
 and am also not seeing the problem.  :-/  I'm not sure whether the gremlins 
 have fixed something, or if I was just too critical in my initial analysis.
 
   For now I'll take that change out of my tree and run without it.  If I see 
 the flapping again, I'll confirm that it's repeatable, then change the 
 alignments as suggested and see if I see a change.
 

I guess the alignment issue of msk(4) has nothing to do with bce(4)
watchdog timeouts.  It would be more helpful to know details of
your controller(bce(4)/brgphy(4) related dmesg output, pciconf
output etc) and network setup.
If you know a reliable way that triggers the watchdog timeouts, 
please share that info too.  I would have tried to disable all
hardware offloading features(TSO, checksum, VLAN H/W tagging etc)
and see whether that makes any differences in the first step to
narrow down the issue.

Thanks.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


10.1-STABLE bce: Watchdog timeout occurred

2015-04-20 Thread Chris Ross

  I got a new [to me] system recently, a Dell PE 1950.  It has two bce parts on 
the motherboard that identify as:

bce#: QLogic NetXtreme II BCM5708 1000Base-T (B2)

  The OS I installed and kernel I'm running are from a download of a 10.1 
STABLE ISO, r281235, April 7, 2015.

  I had gone on to check out a newer stable from subversion, and build a custom 
kernel, but when I booted that one I got a bce0 that didn't seem to work, and 
kept emitting:

bce0: /usr/src/sys/dev/bce/if_bce.c(7869): Watchdog timeout occurred, resetting!
bce0: link state changed to DOWN
bce0: link state changed to UP

  So, I fell back.  But I've since noticed that even the original kernel seems 
to do this after booting.  I'm not yet running any notable amount of traffic 
through the system, but intend to make it an edge router, so certainly will be.

  Is there any sort of issue noted in the bce driver in recent 
days/weeks/months?  Are other folks seeing this diagnostic/error?

  I'll do a little more testing and see if I'm seeing it more or less often, 
but I know that in at least some cases the interface has flapped like this 
after boot for long enough that I was unable to get connected remotely, and 
resorted to a console login to reboot.

- Chris



signature.asc
Description: Message signed with OpenPGP using GPGMail