Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-10-17 Thread Fredrik Widlund
Hi,

We have a Dell 1950 with the same problem (bce). We tried
debug.mpsafenet=0, but to no avail. It's a very frustrating show-stopper
for us as well, we're moving all 1950 out of the production environment.
Any help would be greatly appreciated.

See mail to freebsd-current mail attached.

Kind regards,
Fredrik Widlund

---BeginMessage---
Hi,

Suddenly the problem occured again. We are running the same setup as
below, but with debug.mpsafenet=0, but it didn't help. This is indeed a
showstopper for us, we are moving all our dell 1950 out of production
environment until we can solve this issue. Any help would be greatly
appreciated.

Kind regards,
Fredrik Widlund

bce0: /usr/src/sys/dev/bce/if_bce.c(5032): Watchdog timeout occurred,
resetting!
bce0: link state changed to DOWN
bce0: link state changed to UP
bce0: /usr/src/sys/dev/bce/if_bce.c(5032): Watchdog timeout occurred,
resetting!
bce0: link state changed to DOWN
bce0: link state changed to UP
[repeat 30 times]

# vmstat -i
interrupt  total   rate
irq14: ata0   47  0
irq16: bce0 bce13019  5
irq18: mfi0  123  0
irq21: uhci0 uhci+ 6  0
irq64: mpt0 1214  2
cpu0: timer  1118344   1997
Total1122753   2004

Fredrik Widlund wrote:
 Hi,

 I can't reproduce the problem. Everything is exactly the same, but I get
 no timeouts and the nic seems to work without any problems.

 Kind regards,
 Fredrik Widlund


 Fredrik Widlund wrote:
   
 Hi,

 An update, right now the BCE nic seems to work, I'm not sure exactly why
 yet. I'm attaching the dmesg however.

 SAS adapter is the PERC 5I, which is handled by the MPT driver in
 6.2-Beta2. I'll continue to look at this. There are some unhandled
 events (0x12, 0x16), but these might not be needed.

 [mpi_ioc.h]
 #define MPI_EVENT_SAS_PHY_LINK_STATUS   (0x0012)
 ...
 #define MPI_EVENT_SAS_DISCOVERY (0x0016)

 [dmesg mpt part]
 mpt0: LSILogic SAS/SATA Adapter port 0xec00-0xecff mem
 0xfc7fc000-0xfc7f,0xfc7e-0xfc7e irq 64 at device 8.0 on pci2
 mpt0: [GIANT-LOCKED]
 mpt0: MPI Version=1.5.12.0
 mpt0: mpt_cam_event: 0x16
 mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required).
 mpt0: mpt_cam_event: 0x12
 mpt0: Unhandled Event Notify Frame. Event 0x12 (ACK not required).
 mpt0: mpt_cam_event: 0x16
 mpt0: Unhandled Event Notify Frame. Event 0x16 (ACK not required).

 Kind regards,
 Fredrik Widlund

 Fredrik Widlund wrote:
   
 
 Hi,

 I'm trying to get FreeBSD working on Dell 1950 (and 2950), which is
 vital since it's no longer possible to buy 1850/2850 units here.

 Hardware:
 PE1950 Xeon 5130, 2GB 667MHz
 SAS 5I
 PERC5E

 6.1-RELEASE: not possible since SAS drives aren't found.
 6.2-BETA2: bce interfaces does not work at all, watchdog timeout
 occured every other second, and _no_ connectivity.

 We are also having problems with some PE1850 failing from time to time
 with watchdog timeout hangs, and have had to debug.mpsafenet=0 these.

 How can we help solve this issue? It would really be a pity to be
 forced to leave FreeBSD but we really can't afford to replace our
 choice of hardware platform.

 Kind regards,
 Fredrik Widlund





 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to
 [EMAIL PROTECTED]
 
   
   
 

 Copyright (c) 1992-2006 The FreeBSD Project.
 Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
 The Regents of the University of California. All rights reserved.
 FreeBSD is a registered trademark of The FreeBSD Foundation.
 FreeBSD 6.2-BETA2 #0: Mon Oct  2 03:32:44 UTC 2006
 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/SMP
 Timecounter i8254 frequency 1193182 Hz quality 0
 CPU: Intel(R) Xeon(R) CPU5130  @ 2.00GHz (1995.01-MHz 686-class 
 CPU)
   Origin = GenuineIntel  Id = 0x6f6  Stepping = 6
   
 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,
 HTT,TM,PBE
   
 Features2=0x4e33dSSE3,RSVD2,MON,DS_CPL,VMX,TM2,b9,CX16,b14,b15,b18
   AMD Features=0x2010NX,LM
   AMD Features2=0x1LAHF
   Cores per package: 2
 real memory  = 2147123200 (2047 MB)
 avail memory = 2096009216 (1998 MB)
 ACPI APIC Table: DELL   PE_SC3  
 FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
  cpu0 (BSP): APIC ID:  0
  cpu1 (AP): APIC ID:  1
 ioapic0: Changing APIC ID to 2
 ioapic1: Changing APIC ID to 3
 ioapic1: WARNING: intbase 64 != expected base 24
 ioapic0 Version 2.0 irqs 0-23 on motherboard
 ioapic1 Version 2.0 irqs 64-87 on motherboard
 kbd1 at kbdmux0
 ath_hal: 0.9.17.2 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, 

Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-10-05 Thread Guy Brand
Scott Long ([EMAIL PROTECTED]) on 04/10/2006 at 14:49 wrote:

 #*default release=cvs tag=RELENG_6 date=2006.08.08.09.12.56
 # OK
 #
 #*default release=cvs tag=RELENG_6 date=2006.08.08.09.21.00
 # BROKEN
 ...
 
 #*default release=cvs tag=RELENG_6
 # BROKEN
 
   From sys commitlogs the culprit commits are:
 
   glebius 2006-08-08 09:19:25 utc
   glebius 2006-08-08 09:20:26 utc

 So you tested before these two changes and after these two changes, yes?

  Yes that's it.

 What about with just the first change and not the second?  Anyways, I'm 

  Because building a kernel that only has the first change (2006-08-08
  09:19:25) fails.

 Can you try a quick test?  Reboot and press '6' at the FreeBSD loader
 menu.  That will drop you to a prompt.  Then enter the following line:
 
 set hint.apic.0.disabled=1

  Done: synced to STABLE-6 of this morning (9:00 UTC)i, made world and
  kernel and boot with APIC disabled. Still same freeze after starting
  X and loading a few tabs in Firefox.

  Thanks for the suggestion Scott.

-- 
  bug

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-10-04 Thread Guy Brand
Craig Boston ([EMAIL PROTECTED]) on 29/09/2006 at 20:19 wrote:

 One thing this patch definitely did do though, is break the nvidia
 driver pretty badly.  Couldn't keep the X server running for more than a
 minute before it froze solid.  Lots of Xid: blah blah blah messages.
 Yes I remembered to rebuild the kernel module ;)

  Hi,


  Since rebuilding to 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #1: Mon
  Oct  2 15:24:04 CEST 2006 DEBUG  i386 on a box having em sharing
  IRQ with nvidia (NVIDIA-FreeBSD-x86-1.0-8756):

  interrupt  total   rate
  irq1: atkbd0   5  0
  irq14: ata0   47  0
  irq16: nvidia0 em+ 86545185
  irq17: fwohci0 7  0
  irq21: twe0 6426 13
  cpu0: timer   927735   1986
  Total1020765   2185

  I freeze the box by starting firefox which reloads a few tabs I keep
  open in my session when under X. This is perfectly reproductible.
  From the logs, first I see:

Oct  2 16:47:39 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
00010597
Oct  2 16:47:43 mojito kernel: NVRM: Xid (0001:00): 8, Channel 
Oct  2 16:47:47 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
00010598
Oct  2 16:47:55 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
00010599
Oct  2 16:48:03 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
0001059a
Oct  2 16:48:11 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
0001059b
Oct  2 16:48:19 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
0001059c
Oct  2 16:48:27 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
0001059d
Oct  2 16:48:35 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
0001059e
Oct  2 16:48:43 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
0001059f
Oct  2 16:48:52 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
000105a0

  then come the watchdogs:

Oct  2 16:48:56 mojito kernel: em0: watchdog timeout -- resetting
Oct  2 16:48:56 mojito kernel: em0: link state changed to DOWN
Oct  2 16:48:58 mojito kernel: em0: link state changed to UP
Oct  2 16:49:00 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
000105a1
Oct  2 16:49:06 mojito kernel: em0: watchdog timeout -- resetting
Oct  2 16:49:06 mojito kernel: em0: link state changed to DOWN
Oct  2 16:49:08 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
000105a2
Oct  2 16:49:08 mojito kernel: em0: link state changed to UP
Oct  2 16:49:16 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
000105a3
Oct  2 16:49:16 mojito kernel: em0: watchdog timeout -- resetting
Oct  2 16:49:16 mojito kernel: em0: link state changed to DOWN
Oct  2 16:49:18 mojito kernel: em0: link state changed to UP
Oct  2 16:49:24 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
000105a4
Oct  2 16:49:26 mojito kernel: em0: watchdog timeout -- resetting
Oct  2 16:49:26 mojito kernel: em0: link state changed to DOWN
Oct  2 16:49:29 mojito kernel: em0: link state changed to UP
Oct  2 16:49:32 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
000105a5
Oct  2 16:49:36 mojito kernel: em0: watchdog timeout -- resetting
Oct  2 16:49:36 mojito kernel: em0: link state changed to DOWN
Oct  2 16:49:39 mojito kernel: em0: link state changed to UP
Oct  2 16:49:47 mojito kernel: em0: watchdog timeout -- resetting
Oct  2 16:49:47 mojito kernel: em0: link state changed to DOWN
Oct  2 16:49:49 mojito kernel: em0: link state changed to UP

  and the box ends up frozen less than a minute later. The traffic
  on the Intel card can be low (pinging a host for a few dozen of
  seconds), medium (reloading a few pages in the tabs of Firefox) or
  high (downloading several iso images from our local FTP mirror):
  whatever I do, if both nvidia and em0 are used, the box freezes.

  Note that I can't freeze the box when doing several simultaneous big
  downloads or taring up a lot of files but NOT running X. So I guess
  it is a shared nvidia/em IRQ issue.

  FreeBSD 6.1-STABLE #0: Fri Jun 23 17:00:43 CEST 2006 had no such problem.
  The DEBUG kernconf is GENERIC + witness options enabled (but they
  do not help in this case).

  I traced back to find which changeset introduced the trouble. The
  results are:

#*default release=cvs tag=RELENG_6 date=2006.06.23.17.00.00
# OK
...

#*default release=cvs tag=RELENG_6 date=2006.08.08.09.12.56
# OK
#
#*default release=cvs tag=RELENG_6 date=2006.08.08.09.21.00
# BROKEN
...

#*default release=cvs tag=RELENG_6
# BROKEN

  From sys commitlogs the culprit commits are:

  glebius 2006-08-08 09:19:25 utc
  freebsd src repository

  modified files:

Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-10-04 Thread Bill Moran
In response to Scott Long [EMAIL PROTECTED]:

 Corrected patch is at:
 
 http://people.freebsd.org/~scottl/usb_fastintr_RELENG_6.diff

I have a Dell 1950 here that's been dedicated to helping solve this
problem.  I can reliably reproduce the watchdog timeout by doing
the following steps:

1) Mount /usr/src via nfs
2) start a -j99 buildworld
3) On a different terminal, do tar czvf /usr/src/temp.tgz /big/directory

Usually only takes a few minutes before a watchdog occurs, and I have
no more networking.

Your patch applied cleanly, and everything built OK.  The results are:
a) My USB keyboard stopped working :(
b) The problem does _not_ improve.

In my case, it's a bce driver that's doing it.  I also have some em
cards in this machine that I can test if the information will be
helpful.

This is quite a show-stopper for us, if there's any other testing/etc
I can do, _please_ let me know.  I might even be able to get remote
console access to this machine approved for a developer.

-- 
Bill Moran
Collaborative Fusion Inc.


IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-10-04 Thread Bill Moran
In response to Bill Moran [EMAIL PROTECTED]:

 In my case, it's a bce driver that's doing it.  I also have some em
 cards in this machine that I can test if the information will be
 helpful.

Note that I can _not_ reproduce the problem with an em interface (a
PCI NIC).  As mentioned earlier, I can reliably and easily produce
a watchdog timeout on the bce interface (onboard).  The em interface
seems rock-solid.

I guess I have a workaround for now, but the offer to test/provide
more information stands.

-- 
Bill Moran
Collaborative Fusion Inc.


IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-10-04 Thread Kris Kennaway
On Wed, Oct 04, 2006 at 10:40:25AM -0400, Bill Moran wrote:
 In response to Scott Long [EMAIL PROTECTED]:
 
  Corrected patch is at:
  
  http://people.freebsd.org/~scottl/usb_fastintr_RELENG_6.diff
 
 I have a Dell 1950 here that's been dedicated to helping solve this
 problem.  I can reliably reproduce the watchdog timeout by doing
 the following steps:
 
 1) Mount /usr/src via nfs
 2) start a -j99 buildworld
 3) On a different terminal, do tar czvf /usr/src/temp.tgz /big/directory
 
 Usually only takes a few minutes before a watchdog occurs, and I have
 no more networking.
 
 Your patch applied cleanly, and everything built OK.  The results are:
 a) My USB keyboard stopped working :(
 b) The problem does _not_ improve.
 
 In my case, it's a bce driver that's doing it.  I also have some em
 cards in this machine that I can test if the information will be
 helpful.
 
 This is quite a show-stopper for us, if there's any other testing/etc
 I can do, _please_ let me know.  I might even be able to get remote
 console access to this machine approved for a developer.

Remote console access would be a help.  I suspect there may be more
than one problem here.

Kris


pgpu6t2nkM1Ej.pgp
Description: PGP signature


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-10-04 Thread Mike Tancsa

At 12:27 PM 10/4/2006, Bill Moran wrote:

In response to Bill Moran [EMAIL PROTECTED]:

 In my case, it's a bce driver that's doing it.  I also have some em
 cards in this machine that I can test if the information will be
 helpful.

Note that I can _not_ reproduce the problem with an em interface (a
PCI NIC).  As mentioned earlier, I can reliably and easily produce


Hi, Just to clarify, you mean without the patch you do run into the 
problem, but with the patch you cannot generate the problem ? Or with 
the em NIC, you have never seen the issue at all ?


---Mike 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-10-04 Thread Bill Moran
In response to Mike Tancsa [EMAIL PROTECTED]:

 At 12:27 PM 10/4/2006, Bill Moran wrote:
 In response to Bill Moran [EMAIL PROTECTED]:
 
   In my case, it's a bce driver that's doing it.  I also have some em
   cards in this machine that I can test if the information will be
   helpful.
 
 Note that I can _not_ reproduce the problem with an em interface (a
 PCI NIC).  As mentioned earlier, I can reliably and easily produce
 
 Hi, Just to clarify, you mean without the patch you do run into the 
 problem, but with the patch you cannot generate the problem ? Or with 
 the em NIC, you have never seen the issue at all ?

Without patch:
* bce locks up easily
* Unable to lock up em
* keyboard works
With patch:
* bce locks up easily
* unable to lock up em
* keyboard doesn't work

-- 
Bill Moran
Collaborative Fusion Inc.


IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-10-04 Thread Bill Moran
In response to Kris Kennaway [EMAIL PROTECTED]:

  This is quite a show-stopper for us, if there's any other testing/etc
  I can do, _please_ let me know.  I might even be able to get remote
  console access to this machine approved for a developer.
 
 Remote console access would be a help.  I suspect there may be more
 than one problem here.

In progress ... I'll contact you privately when it's ready.

-- 
Bill Moran
Collaborative Fusion Inc.


IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-10-04 Thread Scott Long

Guy Brand wrote:

Craig Boston ([EMAIL PROTECTED]) on 29/09/2006 at 20:19 wrote:



One thing this patch definitely did do though, is break the nvidia
driver pretty badly.  Couldn't keep the X server running for more than a
minute before it froze solid.  Lots of Xid: blah blah blah messages.
Yes I remembered to rebuild the kernel module ;)



  Hi,


  Since rebuilding to 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #1: Mon
  Oct  2 15:24:04 CEST 2006 DEBUG  i386 on a box having em sharing
  IRQ with nvidia (NVIDIA-FreeBSD-x86-1.0-8756):

  interrupt  total   rate
  irq1: atkbd0   5  0
  irq14: ata0   47  0
  irq16: nvidia0 em+ 86545185
  irq17: fwohci0 7  0
  irq21: twe0 6426 13
  cpu0: timer   927735   1986
  Total1020765   2185

  I freeze the box by starting firefox which reloads a few tabs I keep
  open in my session when under X. This is perfectly reproductible.
  From the logs, first I see:

Oct  2 16:47:39 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
00010597
Oct  2 16:47:43 mojito kernel: NVRM: Xid (0001:00): 8, Channel 
Oct  2 16:47:47 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
00010598
Oct  2 16:47:55 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
00010599
Oct  2 16:48:03 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
0001059a
Oct  2 16:48:11 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
0001059b
Oct  2 16:48:19 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
0001059c
Oct  2 16:48:27 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
0001059d
Oct  2 16:48:35 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
0001059e
Oct  2 16:48:43 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
0001059f
Oct  2 16:48:52 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
000105a0

  then come the watchdogs:

Oct  2 16:48:56 mojito kernel: em0: watchdog timeout -- resetting
Oct  2 16:48:56 mojito kernel: em0: link state changed to DOWN
Oct  2 16:48:58 mojito kernel: em0: link state changed to UP
Oct  2 16:49:00 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
000105a1
Oct  2 16:49:06 mojito kernel: em0: watchdog timeout -- resetting
Oct  2 16:49:06 mojito kernel: em0: link state changed to DOWN
Oct  2 16:49:08 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
000105a2
Oct  2 16:49:08 mojito kernel: em0: link state changed to UP
Oct  2 16:49:16 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
000105a3
Oct  2 16:49:16 mojito kernel: em0: watchdog timeout -- resetting
Oct  2 16:49:16 mojito kernel: em0: link state changed to DOWN
Oct  2 16:49:18 mojito kernel: em0: link state changed to UP
Oct  2 16:49:24 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
000105a4
Oct  2 16:49:26 mojito kernel: em0: watchdog timeout -- resetting
Oct  2 16:49:26 mojito kernel: em0: link state changed to DOWN
Oct  2 16:49:29 mojito kernel: em0: link state changed to UP
Oct  2 16:49:32 mojito kernel: NVRM: Xid (0001:00): 16, Head  Count 
000105a5
Oct  2 16:49:36 mojito kernel: em0: watchdog timeout -- resetting
Oct  2 16:49:36 mojito kernel: em0: link state changed to DOWN
Oct  2 16:49:39 mojito kernel: em0: link state changed to UP
Oct  2 16:49:47 mojito kernel: em0: watchdog timeout -- resetting
Oct  2 16:49:47 mojito kernel: em0: link state changed to DOWN
Oct  2 16:49:49 mojito kernel: em0: link state changed to UP

  and the box ends up frozen less than a minute later. The traffic
  on the Intel card can be low (pinging a host for a few dozen of
  seconds), medium (reloading a few pages in the tabs of Firefox) or
  high (downloading several iso images from our local FTP mirror):
  whatever I do, if both nvidia and em0 are used, the box freezes.

  Note that I can't freeze the box when doing several simultaneous big
  downloads or taring up a lot of files but NOT running X. So I guess
  it is a shared nvidia/em IRQ issue.

  FreeBSD 6.1-STABLE #0: Fri Jun 23 17:00:43 CEST 2006 had no such problem.
  The DEBUG kernconf is GENERIC + witness options enabled (but they
  do not help in this case).

  I traced back to find which changeset introduced the trouble. The
  results are:

#*default release=cvs tag=RELENG_6 date=2006.06.23.17.00.00
# OK
...

#*default release=cvs tag=RELENG_6 date=2006.08.08.09.12.56
# OK
#
#*default release=cvs tag=RELENG_6 date=2006.08.08.09.21.00
# BROKEN
...

#*default release=cvs tag=RELENG_6
# BROKEN

  From sys commitlogs the culprit commits are:

  glebius 2006-08-08 09:19:25 utc
  freebsd src repository

  

Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-10-04 Thread Martin Blapp


Hi,

What about with just the first change and not the second?  Anyways, I'm 
starting to see a trend here.  Problem reports are clustering around UP

systems, not SMP systems.  I don't know if that's just coincidence or not.


We've got also about twenty SMP Systems, seven of them now with 6.1 Prerelease 
and we don't have any affected systems. bge- and em- cards are working fine, 
even under high load situations.


Martin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-10-04 Thread Jorge Aldana
I also have been using em (on-board NIC) with SMP without any problems, I just 
upgraded to check and all is still fine:


New kernel : FreeBSD 6.2-PRERELEASE #7: Mon Oct  2 15:15:47 PDT 2006
Old kernel : FreeBSD 6.1-STABLE #4: Wed Sep  6 16:01:23 PDT 2006

I also have nvidia and use firefox with pre-saved tabs (~30), all works fine 
even on re-loading.


Let me know if you would like any other info.

Jorge

On Thu, 5 Oct 2006, Martin Blapp wrote:



Hi,

We've got also about twenty SMP Systems, seven of them now with 6.1 
Prerelease and we don't have any affected systems. bge- and em- cards are 
working fine, even under high load situations.


Martin

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-10-04 Thread Brian

Martin Blapp wrote:


Hi,

What about with just the first change and not the second?  Anyways, 
I'm starting to see a trend here.  Problem reports are clustering 
around UP
systems, not SMP systems.  I don't know if that's just coincidence or 
not.


We've got also about twenty SMP Systems, seven of them now with 6.1 
Prerelease and we don't have any affected systems. bge- and em- cards 
are working fine, even under high load situations.


Martin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]
I remember having this problem a few years ago on an openbsd box with 2 
nics.  At that time, I found a mailing list post outlining a process 
where you'd enter a break sequence to get  to a command prompt before 
booting and enter some command there , I believe to disable acpi, and 
that would help.  its been like 3-4 years so i dont remember the details.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-10-01 Thread Pete French
 Are you enabling an option, like IPv6, that puts Giant over the network 
 stack?

Am not enabling anything, but if INET6 is part of GENERIC (which I think it is
isn't it?) then I would have that in my kernels as they basically look
like this:

include GENERIC

options SMP

device  pf
device  atapicam

options ALTQ
options ALTQ_CBQ
options ALTQ_RED
options ALTQ_RIO
options ALTQ_HFSC
options ALTQ_CDNR
options ALTQ_PRIQ
options ALTQ_NOPCC

Actually, how do I 'unoption' something which has already been included,
is there some equivalent to 'nodevice' for options ?

-pete.



 Scott


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-10-01 Thread Kris Kennaway
On Sun, Oct 01, 2006 at 01:37:38PM +0100, Pete French wrote:
  Are you enabling an option, like IPv6, that puts Giant over the network 
  stack?
 
 Am not enabling anything, but if INET6 is part of GENERIC (which I think it is
 isn't it?) then I would have that in my kernels as they basically look
 like this:
 
   include GENERIC
 
   options SMP
 
   device  pf
   device  atapicam
 
   options ALTQ
   options ALTQ_CBQ
   options ALTQ_RED
   options ALTQ_RIO
   options ALTQ_HFSC
   options ALTQ_CDNR
   options ALTQ_PRIQ
   options ALTQ_NOPCC
 
 Actually, how do I 'unoption' something which has already been included,
 is there some equivalent to 'nodevice' for options ?

nooption

Kris


pgpgu0c4TSegO.pgp
Description: PGP signature


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-10-01 Thread Erik Trulsson
On Sun, Oct 01, 2006 at 01:37:38PM +0100, Pete French wrote:
  Are you enabling an option, like IPv6, that puts Giant over the network 
  stack?
 
 Am not enabling anything, but if INET6 is part of GENERIC (which I think it is
 isn't it?) then I would have that in my kernels as they basically look
 like this:
 
   include GENERIC
 
   options SMP
 
   device  pf
   device  atapicam
 
   options ALTQ
   options ALTQ_CBQ
   options ALTQ_RED
   options ALTQ_RIO
   options ALTQ_HFSC
   options ALTQ_CDNR
   options ALTQ_PRIQ
   options ALTQ_NOPCC
 
 Actually, how do I 'unoption' something which has already been included,
 is there some equivalent to 'nodevice' for options ?

Yes, there is such a thing. It is (not too surprisingly) spelled 'nooption'
and is actually documented in the config(5) manpage.


-- 
Insert your favourite quote here.
Erik Trulsson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-10-01 Thread Martin Nilsson

Just an observation.

All the boxes I've had this problem on have _two_ em interfaces. I have 
never seen it on my boxes with just one em NIC.


The error is always em0 timeout - never em1 (I haven't seen any!)

Yesterday my local network got completely wacky, the gateway had em0 
timeouts on the screen: but em0 is the _outside_ the windows box that I 
had to reboot was attached to the inside on em1!


Could there be something wrong in the driver if we have more than one em 
interface?


Regards,
Martin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-10-01 Thread Scott Long

Martin Nilsson wrote:

Just an observation.

All the boxes I've had this problem on have _two_ em interfaces. I have 
never seen it on my boxes with just one em NIC.


The error is always em0 timeout - never em1 (I haven't seen any!)

Yesterday my local network got completely wacky, the gateway had em0 
timeouts on the screen: but em0 is the _outside_ the windows box that I 
had to reboot was attached to the inside on em1!


Could there be something wrong in the driver if we have more than one em 
interface?


Regards,
Martin


Multiple instances of the driver have no knowledge of each other.
Nothing between them is shared.  Even if they share an interrupt,
it is a detail that is hidden.

Scott
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-10-01 Thread David G Lawrence
 Just an observation.
 
 All the boxes I've had this problem on have _two_ em interfaces. I have 
 never seen it on my boxes with just one em NIC.
 
 The error is always em0 timeout - never em1 (I haven't seen any!)
 
 Yesterday my local network got completely wacky, the gateway had em0 
 timeouts on the screen: but em0 is the _outside_ the windows box that I 
 had to reboot was attached to the inside on em1!
 
 Could there be something wrong in the driver if we have more than one em 
 interface?

   A machine I have here that shows the problem has one fxp and one em and
the timeouts occur on both interfaces.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-10-01 Thread Matthew Jacob

Just to add a data point: I just upgraded feral.com to the latest
RELENG_6 branch. I have a dual port em for internal networks and I've
never seen the problems reported.

Also, for -current, things have now been stable again for the last
week or so for em on multiple machines (most of which have dual em
i/f's)
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-30 Thread Paul Allen
From Kris Kennaway [EMAIL PROTECTED], Fri, Sep 29, 2006 at 09:42:42PM -0400:
 On Fri, Sep 29, 2006 at 08:34:39PM -0500, Craig Boston wrote:
  On Fri, Sep 29, 2006 at 08:19:04PM -0500, Craig Boston wrote:
   On Thu, Sep 28, 2006 at 01:48:42PM -0600, Scott Long wrote:
http://people.freebsd.org/~scottl/usb_fastintr_RELENG_6.diff
   
   At first glance it appeared to work, but I'm about to do some more
   testing since I just discovered that I have to kldload something
   (anything) first in order to reproduce the problem.  Weird.
  
  I can confirm that despite the other side effect I already mentioned,
  this patch does fix or at least mask the problem I'm seing with em (and
  probably usb).
 
 Which is odd since the hypothesis Scott was working on should have
 shown up clearly in the mutex trace, but did not.

But it is consistent with there being a beat-frequency problem with 
respect to the scheduler.  I think the number you really need is not
how long giant was held but how long was spent waiting for it.


Paul

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-30 Thread Scott Long

Craig Boston wrote:


On Thu, Sep 28, 2006 at 01:48:42PM -0600, Scott Long wrote:


http://people.freebsd.org/~scottl/usb_fastintr_RELENG_6.diff



At first glance it appeared to work, but I'm about to do some more
testing since I just discovered that I have to kldload something
(anything) first in order to reproduce the problem.  Weird.

One thing this patch definitely did do though, is break the nvidia
driver pretty badly.  Couldn't keep the X server running for more than a
minute before it froze solid.  Lots of Xid: blah blah blah messages.
Yes I remembered to rebuild the kernel module ;)

Oh, and if anyone is curious, I am able to reproduce the problem after
booting without nvidia.ko loaded, using qemu in -nographic mode.  Just
wanted to rule that out since its code that's out of our control and
would be a prime target to blame if I didn't.

Craig


My patch shouldn't have a single effect on nvidia.  It just gets the USB 
out of the way of other drivers.  Weird.  But what does 'blah blah' 
translate into?


Scott

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-30 Thread Scott Long

David G Lawrence wrote:


  Attached is a simple user program that will immediately cause pretty much
all of the network drivers (at least the ones I own) to stop working and
get watchdog timeouts.


I am runnign this on a single processor machine with an SMP kernel and
it does not have any effect. I dont tink I have any single processor machines
running a non SMP kernel to try it on though. Not particularly helpful I know. 
I'll



   Actually, I think it is helpful to know that the program only has an
effect on some machines. We just need to figure out what the common
denominator is.




Are you enabling an option, like IPv6, that puts Giant over the network 
stack?


Scott

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-30 Thread David G Lawrence
 Are you enabling an option, like IPv6, that puts Giant over the network 
 stack?

From dmesg:

WARNING: debug.mpsafenet forced to 0 as ipsec requires Giant
WARNING: MPSAFE network stack disabled, expect reduced performance.

   ...the kernel has IPSEC.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-30 Thread Robert Watson

On Sat, 30 Sep 2006, Scott Long wrote:


David G Lawrence wrote:

  Attached is a simple user program that will immediately cause pretty 
much

all of the network drivers (at least the ones I own) to stop working and
get watchdog timeouts.


I am runnign this on a single processor machine with an SMP kernel and
it does not have any effect. I dont tink I have any single processor 
machines
running a non SMP kernel to try it on though. Not particularly helpful I 
know. I'll


   Actually, I think it is helpful to know that the program only has an 
effect on some machines. We just need to figure out what the common 
denominator is.


Are you enabling an option, like IPv6, that puts Giant over the network 
stack?


IPv6 has Giant over its netisr, but not over the entire network stack.  If 
Giant is being placed over the stack due to use of an option that forces it 
(such as KAME IPSEC) you should be able to grep this out of dmesg by doing 
something along the lines of the following:


grep WARNING: debug.mpsafenet /var/run/dmesg

Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-30 Thread Robert Watson


On Fri, 29 Sep 2006, David G Lawrence wrote:


Are you enabling an option, like IPv6, that puts Giant over the network
stack?



From dmesg:


WARNING: debug.mpsafenet forced to 0 as ipsec requires Giant
WARNING: MPSAFE network stack disabled, expect reduced performance.

  ...the kernel has IPSEC.


If you're not using IPv6 over IPSEC, consider trying FAST_IPSEC isntead.

Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-30 Thread Kris Kennaway
On Fri, Sep 29, 2006 at 11:05:35PM -0700, Paul Allen wrote:
 From Kris Kennaway [EMAIL PROTECTED], Fri, Sep 29, 2006 at 09:42:42PM 
 -0400:
  On Fri, Sep 29, 2006 at 08:34:39PM -0500, Craig Boston wrote:
   On Fri, Sep 29, 2006 at 08:19:04PM -0500, Craig Boston wrote:
On Thu, Sep 28, 2006 at 01:48:42PM -0600, Scott Long wrote:
 http://people.freebsd.org/~scottl/usb_fastintr_RELENG_6.diff

At first glance it appeared to work, but I'm about to do some more
testing since I just discovered that I have to kldload something
(anything) first in order to reproduce the problem.  Weird.
   
   I can confirm that despite the other side effect I already mentioned,
   this patch does fix or at least mask the problem I'm seing with em (and
   probably usb).
  
  Which is odd since the hypothesis Scott was working on should have
  shown up clearly in the mutex trace, but did not.
 
 But it is consistent with there being a beat-frequency problem with 
 respect to the scheduler.  I think the number you really need is not
 how long giant was held but how long was spent waiting for it.

It also seemed to show that nothing was really waiting for it (the
cnt_* entries).

Kris


pgpfcLLwtdeCE.pgp
Description: PGP signature


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-30 Thread Craig Boston
On Sat, Sep 30, 2006 at 12:14:17AM -0600, Scott Long wrote:
 One thing this patch definitely did do though, is break the nvidia
 driver pretty badly.  Couldn't keep the X server running for more than a
 minute before it froze solid.  Lots of Xid: blah blah blah messages.
 Yes I remembered to rebuild the kernel module ;)
 
 My patch shouldn't have a single effect on nvidia.  It just gets the USB 
 out of the way of other drivers.  Weird.  But what does 'blah blah' 
 translate into?

It didn't make any sense to me either after looking at the patch...  I'm
100% sure that was the only change between boots, and it started working
again after I reverted the sys/dev/usb directory and rebuilt. (svk is
great for juggling patch sets around)

That's one of the reasons I briefly suspected the nvidia driver causing
problems somewhere, so I removed that from the mix just to be sure.

'blah blah' translates into numbers that mean nothing to me, but they
may be useful to someone:

Sep 29 16:57:09 kernel: NVRM: Xid (0001:00): 16, Head  Count 0ae5
Sep 29 16:57:09 kernel: NVRM: Xid (0001:00): 16, Head 0001 Count 0ae4
Sep 29 16:57:11 kernel: NVRM: Xid (0001:00): 8, Channel 
Sep 29 16:57:17 kernel: NVRM: Xid (0001:00): 16, Head  Count 0ae6
Sep 29 16:57:17 kernel: NVRM: Xid (0001:00): 16, Head 0001 Count 0ae5
Sep 29 16:57:19 kernel: NVRM: Xid (0001:00): 8, Channel 001e
Sep 29 16:57:25 kernel: NVRM: Xid (0001:00): 16, Head  Count 0ae7
Sep 29 16:57:25 kernel: NVRM: Xid (0001:00): 16, Head 0001 Count 0ae6
Sep 29 16:57:27 kernel: NVRM: Xid (0001:00): 8, Channel 001e
Sep 29 16:57:33 kernel: NVRM: Xid (0001:00): 16, Head  Count 0ae8
Sep 29 16:57:33 kernel: NVRM: Xid (0001:00): 16, Head 0001 Count 0ae7
Sep 29 16:57:35 kernel: NVRM: Xid (0001:00): 8, Channel 001e
Sep 29 16:57:41 kernel: NVRM: Xid (0001:00): 16, Head  Count 0ae9
Sep 29 16:57:41 kernel: NVRM: Xid (0001:00): 16, Head 0001 Count 0ae8
Sep 29 16:57:43 kernel: NVRM: Xid (0001:00): 8, Channel 001e
Sep 29 16:57:49 kernel: NVRM: Xid (0001:00): 16, Head  Count 0aea
Sep 29 16:58:19 kernel: NVRM: Xid (0001:00): 8, Channel 
Sep 29 16:58:27 kernel: NVRM: Xid (0001:00): 8, Channel 001e
Sep 29 16:58:51 last message repeated 3 times
Sep 29 16:58:51 kernel: NVRM: Xid (0001:00): 7, Ch 0001 M D 
bfef0007 intr 0001

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-30 Thread Craig Boston
On Sat, Sep 30, 2006 at 02:39:06PM -0400, Kris Kennaway wrote:
   Which is odd since the hypothesis Scott was working on should have
   shown up clearly in the mutex trace, but did not.
  
  But it is consistent with there being a beat-frequency problem with 
  respect to the scheduler.  I think the number you really need is not
  how long giant was held but how long was spent waiting for it.
 
 It also seemed to show that nothing was really waiting for it (the
 cnt_* entries).

I can set up a serial console an poke around in DDB during my test case
if anyone thinks some useful information can be found.

Unfortunately I'm remote from the machine right now so I won't be able
to do that until Monday :/

Craig
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-30 Thread Karl Denninger
I wonder if this is related to the breakage of the Rocketport driver (PR is
open, but it appears that nobody has looked at it.)

It breaks specifically when I use a piece of software that does a lot of
SELECTs on a terminal line to do pretty much what poll does but it
is not specific to a uniprocessor or SMP kernel - it is reliably hosed in both
cases.

--
-- 
Karl Denninger ([EMAIL PROTECTED]) Internet Consultant  Kids Rights Activist
http://www.denninger.netMy home on the net - links to everything I do!
http://scubaforum.org   Your UNCENSORED place to talk about DIVING!
http://genesis3.blogspot.comMusings Of A Sentient Mind

On Fri, Sep 29, 2006 at 05:14:33AM -0700, David G Lawrence wrote:
  Do you have any history of seeing the watchdog timeout problem on your
   machine?
  
  On this machine no - but it's the only one running em0. On other
  machines running bge0 then, yes, I see it a lot. But those are all
  SMP machines, aside from one. On that one I am currently building
  the latest 6-STABLE and when it's done (give it a couple of hours)
  I will give it a shot with your code and see what happens.
 
Another data point: After rebooting my machine, the program no longer
 causes the problem. It appears that something else has to occur first on
 the machine to put it into a state that makes it suspectible to the
 program.
 
 -DG
 
 David G. Lawrence
 President
 Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
 The FreeBSD Project - http://www.freebsd.org
 Pave the road of life with opportunities.
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to [EMAIL PROTECTED]
 
 
 %SPAMBLOCK-SYS: Matched [EMAIL PROTECTED], message ok


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-29 Thread David G Lawrence
   Attached is a simple user program that will immediately cause pretty much
all of the network drivers (at least the ones I own) to stop working and
get watchdog timeouts.

WARNING: This program will kill the network on your 6.x server. Do not run 
this on a production machine unless you are on the console and can ctrl-C
it!

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
#include sys/poll.h

main()
{
struct pollfd pfd;

pfd.fd = 1;
pfd.events = POLLOUT;
pfd.revents = 0;

while (1) {
if (poll(pfd, 1 /* stdout */, -1)  0)
break;
}
}
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-29 Thread David G Lawrence
Attached is a simple user program that will immediately cause pretty much
 all of the network drivers (at least the ones I own) to stop working and
 get watchdog timeouts.

   Oh, one more thing - I've only tried this on uni-processor machines. The
only MP machine that I have here is a production machine that I can't test
this on right now.
   If running this on an SMP machine doesn't show the problem, then try
running multiple copies of it (one for each CPU).

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-29 Thread Igor Robul
On Fri, Sep 29, 2006 at 12:27:41AM -0700, David G Lawrence wrote:
Attached is a simple user program that will immediately cause pretty much
 all of the network drivers (at least the ones I own) to stop working and
 get watchdog timeouts.
 
 WARNING: This program will kill the network on your 6.x server. Do not run 
 this on a production machine unless you are on the console and can ctrl-C
 it!
I have tried this program on my workstation and I have not got any
timeouts, network works good.
sysadm:~uname -a
FreeBSD sysadm.stc 6.1-STABLE FreeBSD 6.1-STABLE #4: Fri Aug 11 14:11:18
MSD 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/SYSADM  amd64
sysadm:~ ifconfig 
nve0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
inet6 fe80::2e0:81ff:fe55:bc54%nve0 prefixlen 64 scopeid 0x1 
inet 192.168.2.26 netmask 0xff00 broadcast 192.168.2.255
inet 192.168.2.55 netmask 0x broadcast 192.168.2.55
ether 00:e0:81:55:bc:54
media: Ethernet autoselect (100baseTX full-duplex)
status: active


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-29 Thread David G Lawrence
 On Fri, Sep 29, 2006 at 12:27:41AM -0700, David G Lawrence wrote:
 Attached is a simple user program that will immediately cause pretty much
  all of the network drivers (at least the ones I own) to stop working and
  get watchdog timeouts.
  
  WARNING: This program will kill the network on your 6.x server. Do not run 
  this on a production machine unless you are on the console and can ctrl-C
  it!
 I have tried this program on my workstation and I have not got any
 timeouts, network works good.
 sysadm:~uname -a
 FreeBSD sysadm.stc 6.1-STABLE FreeBSD 6.1-STABLE #4: Fri Aug 11 14:11:18

   Is this build date also about the same date that you cvsup'd the sources?

 MSD 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/SYSADM  amd64
 sysadm:~ ifconfig 
 nve0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
 inet6 fe80::2e0:81ff:fe55:bc54%nve0 prefixlen 64 scopeid 0x1 
 inet 192.168.2.26 netmask 0xff00 broadcast 192.168.2.255
 inet 192.168.2.55 netmask 0x broadcast 192.168.2.55
 ether 00:e0:81:55:bc:54
 media: Ethernet autoselect (100baseTX full-duplex)
 status: active

   Is this a UP machine or MP machine?

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-29 Thread Igor Robul
On Fri, Sep 29, 2006 at 01:16:47AM -0700, David G Lawrence wrote:
Is this a UP machine or MP machine?
Dualcore AMD64. 
sysadm:~sysctl hw.ncpu
hw.ncpu: 2

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-29 Thread Pete French
Attached is a simple user program that will immediately cause pretty much
 all of the network drivers (at least the ones I own) to stop working and
 get watchdog timeouts.

I am runnign this on a single processor machine with an SMP kernel and
it does not have any effect. I dont tink I have any single processor machines
running a non SMP kernel to try it on though. Not particularly helpful I know. 
I'll
try building a non SMP kernel for this machine if I can.

-pete.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-29 Thread David G Lawrence
 Attached is a simple user program that will immediately cause pretty much
  all of the network drivers (at least the ones I own) to stop working and
  get watchdog timeouts.
 
 I am runnign this on a single processor machine with an SMP kernel and
 it does not have any effect. I dont tink I have any single processor machines
 running a non SMP kernel to try it on though. Not particularly helpful I 
 know. I'll

   Actually, I think it is helpful to know that the program only has an
effect on some machines. We just need to figure out what the common
denominator is.

 try building a non SMP kernel for this machine if I can.

   Do you have any history of seeing the watchdog timeout problem on your
machine?

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-29 Thread Pete French
Do you have any history of seeing the watchdog timeout problem on your
 machine?

On this machine no - but it's the only one running em0. On other
machines running bge0 then, yes, I see it a lot. But those are all
SMP machines, aside from one. On that one I am currently building
the latest 6-STABLE and when it's done (give it a couple of hours)
I will give it a shot with your code and see what happens.

-pete.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-29 Thread Pete French
Do you have any history of seeing the watchdog timeout problem on your
 machine?

O.K., I just finished compiing up a uniprocessor kenel for the machine
on which I had been seeing bge0 timeouts, and the lopppoll.c code
has no effect there. The kerenl I am running is the latest STABLE from
a couple of hours ago.

-pete.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-29 Thread David G Lawrence
 Do you have any history of seeing the watchdog timeout problem on your
  machine?
 
 On this machine no - but it's the only one running em0. On other
 machines running bge0 then, yes, I see it a lot. But those are all
 SMP machines, aside from one. On that one I am currently building
 the latest 6-STABLE and when it's done (give it a couple of hours)
 I will give it a shot with your code and see what happens.

   Another data point: After rebooting my machine, the program no longer
causes the problem. It appears that something else has to occur first on
the machine to put it into a state that makes it suspectible to the
program.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-29 Thread Craig Boston
I've been experiencing this problem too, along with my USB keyboard
acting 'wonky' (stuttering from time to time).  For me at least it seems
to be tied to CPU usage, meaning it's probably related to the taskqueue
or maybe even the scheduler.  I can also reproduce the problem on a much
bigger scale than I've seen mentioned anywhere else (up to 30 seconds!).

One sure-fire way to trigger it for me is to boot the Ubuntu 6.06.1 CD
inside of qemu.  I don't have kqemu or anything loaded -- it can be
provoked by an ordinary process running as an ordinary user.

While it's sitting at the GRUB screen (30 second countdown), my USB keyboard
becomes inoperable, and em0 goes totally dead.  It feels like no interrupts
getting through -- if a key was pressed it will repeat until the 30 seconds are
up or I kill the process.

I initially suspected something holding GIANT for a long time, so I
tried the giantless USB patches but that didn't help.

Interestingly, I have another em interface in this machine but it
continues to work.

em0 is sharing irq19 with uhci1 (which the keyboard is attached to).
em1 is on irq18.  So whatever it is somehow stops irq19 from getting
through, but the other IRQ lines seem unaffected.  Sounding more and
more like an APIC problem to me.  Or possibly the ithread getting stuck.

This machine *DID* work fine until sometime between 6.1 release and now.
Unfortunately I can't seem to reproduce the problem on any of my test
machines, only on the one that I need for day to day work :)

I'm about halfway through reading the thread, but will be happy to test
any patches do whatever I can to help.

Craig

[EMAIL PROTECTED]:10:0:  class=0x02 card=0x002e8086 chip=0x100e8086 
rev=0x02 hdr=0x00
vendor   = 'Intel Corporation'
device   = '82540EM Gigabit Ethernet Controller'
class= network
subclass = ethernet
[EMAIL PROTECTED]:12:0:  class=0x02 card=0x002e1028 chip=0x100e8086 
rev=0x02 hdr=0x00
vendor   = 'Intel Corporation'
device   = '82540EM Gigabit Ethernet Controller'
class= network
subclass = ethernet

On Thu, Sep 28, 2006 at 08:13:51AM -0600, Scott Long wrote:
 All,
 
 Attached is my first cut at addressing the problems described in this 
 thread.  As I discussed earlier, the VM syncer thread is likely starving
 the USB interrupt thread.  This causes the shared usb+network interrupt 
 to remain masked, preventing network interrupts from being delivered,
 and thus triggering watchdog timeouts.
 
 This patch only addresses the USB driver.  If your network card is 
 sharing an interrupt with something other than (or additional to) USB,
 this might not work for you.  Also, this patch is just a very rough
 proof-of-concept and is not meant for production use.  But I'd like to
 get feedback now before I spend more time on this.  If this works then
 I'll clean it up and make it suitable for the release, and I'll look at
 some of the other drivers like ichsmb.
 
 If this is to be fixed for 6.2, I need lots of feedback ASAP.  So please
 do not be shy =-)  The patch is at:
 
 http://people.freebsd.org/~scottl/usb_fastintr.diff
 
 Scott
 
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to [EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-29 Thread Craig Boston
Doesn't seem to have any effect for me (other than high sys% times).
qemu is really good at provoking my em0 to timeout.

On Fri, Sep 29, 2006 at 12:27:41AM -0700, David G Lawrence wrote:
Attached is a simple user program that will immediately cause pretty much
 all of the network drivers (at least the ones I own) to stop working and
 get watchdog timeouts.
 
 WARNING: This program will kill the network on your 6.x server. Do not run 
 this on a production machine unless you are on the console and can ctrl-C
 it!
 
 -DG
 
 David G. Lawrence
 President
 Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
 The FreeBSD Project - http://www.freebsd.org
 Pave the road of life with opportunities.

 #include sys/poll.h
 
 main()
 {
   struct pollfd pfd;
 
   pfd.fd = 1;
   pfd.events = POLLOUT;
   pfd.revents = 0;
 
   while (1) {
   if (poll(pfd, 1 /* stdout */, -1)  0)
   break;
   }
 }

 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to [EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-29 Thread John Baldwin
On Friday 29 September 2006 06:37, Pete French wrote:
 Attached is a simple user program that will immediately cause pretty 
much
  all of the network drivers (at least the ones I own) to stop working and
  get watchdog timeouts.
 
 I am runnign this on a single processor machine with an SMP kernel and
 it does not have any effect. I dont tink I have any single processor
 machines running a non SMP kernel to try it on though. Not particularly
 helpful I know. I'll try building a non SMP kernel for this machine if I
 can. 

You can set kern.smp.disabled=1 from the loader to force UP with an SMP 
kernel.  No need to recompile.

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-29 Thread Kris Kennaway
On Fri, Sep 29, 2006 at 04:21:55PM -0500, Craig Boston wrote:
 Doesn't seem to have any effect for me (other than high sys% times).
 qemu is really good at provoking my em0 to timeout.

What might be useful for someone who can provoke this, is to configure
your kernel with MUTEX_PROFILING, then do the following:

sysctl debug.mutex.prof.enable=1
start_your_test_case
sysctl debug.mutex.prof.enable=0

Then:

sysctl debug.mutex.prof.stats  stats.out

and provide access to that file.

This will help to show whether something is causing Giant starvation.

Kris


pgp2MwL1NZbeY.pgp
Description: PGP signature


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-29 Thread Craig Boston
On Fri, Sep 29, 2006 at 05:37:40PM -0400, Kris Kennaway wrote:
 What might be useful for someone who can provoke this, is to configure
 your kernel with MUTEX_PROFILING, then do the following:

 snip

 This will help to show whether something is causing Giant starvation.

I'm currently building a kernel with Scott's patch -- if it still
happens I'll build one with MUTEX_PROFILING and get the results.

Craig
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-29 Thread Craig Boston
On Fri, Sep 29, 2006 at 05:37:40PM -0400, Kris Kennaway wrote:
 and provide access to that file.
 
 This will help to show whether something is causing Giant starvation.

http://www.gank.org/freebsd/stats.out

That's after about 25 seconds of the em0 interface being unable to
receive because of an apparent lack of interrupt processing (it can
still transmit though!  I had a half-open ssh session that continued to
receive data for a while).

Interesting data point #1: After a fresh boot, I'm unable to reproduce
the problem until I use the kernel linker.  After kldloading a module
(any module) and then immediately unloading it, my test case works 100%
until I reboot again.

Interesting data point #2: After a reboot, the problem moved from em0 on
irq19 to em1 on irq18.  I'm remote from the machine right now so I can't
test the usb controller that's sharing that interrupt, though I suspect
it experienced the same lack of response.

Craig
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-29 Thread Craig Boston
On Thu, Sep 28, 2006 at 01:48:42PM -0600, Scott Long wrote:
 http://people.freebsd.org/~scottl/usb_fastintr_RELENG_6.diff

At first glance it appeared to work, but I'm about to do some more
testing since I just discovered that I have to kldload something
(anything) first in order to reproduce the problem.  Weird.

One thing this patch definitely did do though, is break the nvidia
driver pretty badly.  Couldn't keep the X server running for more than a
minute before it froze solid.  Lots of Xid: blah blah blah messages.
Yes I remembered to rebuild the kernel module ;)

Oh, and if anyone is curious, I am able to reproduce the problem after
booting without nvidia.ko loaded, using qemu in -nographic mode.  Just
wanted to rule that out since its code that's out of our control and
would be a prime target to blame if I didn't.

Craig
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-29 Thread Craig Boston
On Fri, Sep 29, 2006 at 08:19:04PM -0500, Craig Boston wrote:
 On Thu, Sep 28, 2006 at 01:48:42PM -0600, Scott Long wrote:
  http://people.freebsd.org/~scottl/usb_fastintr_RELENG_6.diff
 
 At first glance it appeared to work, but I'm about to do some more
 testing since I just discovered that I have to kldload something
 (anything) first in order to reproduce the problem.  Weird.

I can confirm that despite the other side effect I already mentioned,
this patch does fix or at least mask the problem I'm seing with em (and
probably usb).

Craig
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-29 Thread Kris Kennaway
On Fri, Sep 29, 2006 at 08:03:29PM -0500, Craig Boston wrote:
 On Fri, Sep 29, 2006 at 05:37:40PM -0400, Kris Kennaway wrote:
  and provide access to that file.
  
  This will help to show whether something is causing Giant starvation.
 
 http://www.gank.org/freebsd/stats.out
 
 That's after about 25 seconds of the em0 interface being unable to
 receive because of an apparent lack of interrupt processing (it can
 still transmit though!  I had a half-open ssh session that continued to
 receive data for a while).

   maxtotal   count   avg cnt_hold cnt_lock name
61 5748 285205   24 
/compile/src/sys/kern/kern_conf.c:311 (Giant)
   921 2016  219610 
/compile/src/sys/kern/kern_sysctl.c:1313 (Giant)
27  646  69 901 
/compile/src/sys/kern/kern_conf.c:287 (Giant)
   223113685334 253 
/compile/src/sys/kern/kern_timeout.c:258 (Giant)
67351714496 774 
/compile/src/sys/kern/kern_conf.c:323 (Giant)
40 1421 236 610 
/compile/src/sys/kern/kern_conf.c:299 (Giant)
 2  698 360 10   10 
/compile/src/sys/kern/kern_intr.c:681 (Giant)
  931037046 212   17421 
/compile/src/sys/kern/kern_synch.c:218 (Giant)
  2440 3304  13   25404 
/compile/src/sys/net/netisr.c:339 (Giant)
 18   5 100 
/compile/src/sys/i386/i386/sys_machdep.c:115 (Giant)
29  585  341700 
/compile/src/sys/kern/kern_descrip.c:376 (Giant)
   162  162   1   16200 
/compile/src/sys/kern/uipc_usrreq.c:937 (Giant)
 88   1 800 
/compile/src/sys/kern/uipc_usrreq.c:1032 (Giant)
81  138   26900 
/compile/src/sys/kern/kern_conf.c:265 (Giant)
   423  457   2   22830 
/compile/src/sys/fs/fifofs/fifo_vnops.c:733 (Giant)
41   76   32500 
/compile/src/sys/fs/fifofs/fifo_vnops.c:711 (Giant)
29   29   12900 
/compile/src/sys/kern/vfs_syscalls.c:336 (Giant)

The times are in microseconds.  There are a couple of places where
Giant was held on the order of milliseconds at least once (the max
column).  One thing to note is that you are using IPv6 on this
machine, which is still under Giant; that may be relevant.  Nothing
really stands out to me as being a major problem though.

Kris


pgp5Gki4gYWq3.pgp
Description: PGP signature


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-29 Thread Kris Kennaway
On Fri, Sep 29, 2006 at 08:34:39PM -0500, Craig Boston wrote:
 On Fri, Sep 29, 2006 at 08:19:04PM -0500, Craig Boston wrote:
  On Thu, Sep 28, 2006 at 01:48:42PM -0600, Scott Long wrote:
   http://people.freebsd.org/~scottl/usb_fastintr_RELENG_6.diff
  
  At first glance it appeared to work, but I'm about to do some more
  testing since I just discovered that I have to kldload something
  (anything) first in order to reproduce the problem.  Weird.
 
 I can confirm that despite the other side effect I already mentioned,
 this patch does fix or at least mask the problem I'm seing with em (and
 probably usb).

Which is odd since the hypothesis Scott was working on should have
shown up clearly in the mutex trace, but did not.

Kris


pgp9YFBjiq1LX.pgp
Description: PGP signature


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-28 Thread Goran Lowkrantz
To add another twist to this: I added options POLLING to the kernel, moved 
the fireware and USB drivers from the kernel and loaded them as modules. I 
have NOT enabled polling on the em-interface but this new kernal, built on 
the same sources as the failing one works without a hitch.


As before, let me know if there is anything I can do to help.

Regards,
Goran L

--On Wednesday, September 27, 2006 13:24:15 +0200 glz 
[EMAIL PROTECTED] wrote:



I have seen the watchdog and reset problem on a -STABLE laptop, both em
and iwi. It only occur when I try to connect using Mulberry e-mail client
so I thought it could be a problem with the linuxilator.

The load on the box is normally low but both driver have shared
interrupts, either with cbb or usb. Here is what I can see:

uname -a:
FreeBSD viglaf 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #55: Thu Sep 21
22:15:38 CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/VIGLAF  i386

dmesg:
em0: Intel(R) PRO/1000 Network Connection Version - 6.1.4 port
0x8000-0x803f mem 0xc022-0xc023,0xc020-0xc020 irq 11 at
device 1.0 on pci2
em0: Ethernet address: 00:0d:60:89:36:e8
em0: [FAST]
iwi0: Intel(R) PRO/Wireless 2915ABG mem 0xc0214000-0xc0214fff irq 9 at
device 2.0 on pci2
iwi0: Ethernet address: 00:16:6f:8b:0a:21

vmstat -i
interrupt  total   rate
irq0: clk   11148090999
irq1: atkbd0   32271  2
irq5: pcm0 atapci+157115 14
irq6: fdc0 1  0
irq7:  1  0
stray irq7 1  0
irq8: rtc1426745127
irq9: cbb1 cbb2++* 26582  2
irq11: cbb0 em0++*762544 68
irq12: psm0   516858 46
irq14: ata043494  3
irq15: ata1   82  0
Total   14113784   1265

This is a development machine so I can debug and test patches as needed.

Best regards,
Goran L

Patrick M. Hausen wrote:

Hello!


On -stable occasionally other people complained about very similar
looking problems with bge and other drivers. My guess is, though
I'm not a kernel developer, just an experienced admin, that
em stands out as problematic just by coincidence. Certain onboard
network components tend to come with certaiin chipsets and certain
architectures.


I forgot to mention: we do have systems with em interfaces that
never showed this problem!

Regards,
Patrick



--
... the future isMobile

  Goran Lowkrantz [EMAIL PROTECTED]
  System Architect, isMobile, Aurorum 2, S-977 75 Luleå, Sweden
  Phone: +46(0)920-75559
  Mobile: +46(0)70-587 87 82 Fax: +46(0)70-615 87 82

http://www.ismobile.com ...
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]




--
... the future isMobile

 Goran Lowkrantz [EMAIL PROTECTED]
 System Architect, isMobile, Aurorum 2, S-977 75 Luleï¿¥, Sweden
 Phone: +46(0)920-75559
 Mobile: +46(0)70-587 87 82 Fax: +46(0)70-615 87 82

http://www.ismobile.com ...

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-28 Thread Scott Long

All,

Attached is my first cut at addressing the problems described in this 
thread.  As I discussed earlier, the VM syncer thread is likely starving
the USB interrupt thread.  This causes the shared usb+network interrupt 
to remain masked, preventing network interrupts from being delivered,

and thus triggering watchdog timeouts.

This patch only addresses the USB driver.  If your network card is 
sharing an interrupt with something other than (or additional to) USB,

this might not work for you.  Also, this patch is just a very rough
proof-of-concept and is not meant for production use.  But I'd like to
get feedback now before I spend more time on this.  If this works then
I'll clean it up and make it suitable for the release, and I'll look at
some of the other drivers like ichsmb.

If this is to be fixed for 6.2, I need lots of feedback ASAP.  So please
do not be shy =-)  The patch is at:

http://people.freebsd.org/~scottl/usb_fastintr.diff

Scott

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-28 Thread Nikolay Pavlov
On Wednesday, 27 September 2006 at  9:40:52 -0700, Jeremy Chadwick wrote:
 On Wed, Sep 27, 2006 at 06:32:59PM +0200, Patrick M. Hausen wrote:
  On Wed, Sep 27, 2006 at 05:59:04PM +0200, Oliver Brandmueller wrote:
   I don't think it has to especially with ichsmb here, but only with the 
   fact, that ichsmb is for me exactly the thing that shares the interrupt 
   with the em interface that shows the problems.
  
  I can confirm that making em0 share an interrupt with the
  SATA-controller on my box makes the problem much much more
  apparent.
 
 So we're all on the same page here -- this really appears to be some
 kind-of kernel interrupt handler problem (something somewhere is
 getting deadlocked?  Not sure).
 
 Has anyone tried rolling back to previous 6.2 builds to try and
 figure out timeframes when this was introduced?  From my perspective,
 it happened sometime between August and the end of September.

I want to confirm that i have watchdog timeout on 6.1-RELEASE-p3 with
GENERIC kernel. USB was disabled on BIOS at all.
Another box calld media2 using 6.1-STABLE from Fri Sep  1 11:54:11 EDT
2006 GENERIC kernel.
Both boxes are UP machines.

Here is additional info:

===
[EMAIL PROTECTED]:~# ifconfig em0
em0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
options=bRXCSUM,TXCSUM,VLAN_MTU
media: Ethernet autoselect (1000baseTX full-duplex)
status: active

===
[EMAIL PROTECTED]:~# ifconfig em0
em0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
options=bRXCSUM,TXCSUM,VLAN_MTU
media: Ethernet autoselect (1000baseTX full-duplex)
status: active

===
[EMAIL PROTECTED]:~# vmstat -i
interrupt  total   rate
irq1: atkbd0 576  0
irq6: fdc0 9  0
irq14: ata0   47  0
irq24: amr0  1314154135
irq28: em0  42909062   4415
cpu0: timer 19436324   2000
Total   63660172   6550

===
[EMAIL PROTECTED]:~# vmstat -i
interrupt  total   rate
irq1: atkbd0  14  0
irq28: em01480465106   3616
irq48: amr0 66293858161
irq72: amr1 6586  0
cpu0: timer818728378   2000
Total 2365493942   5779

===
[EMAIL PROTECTED]:~# pciconf -lv
[EMAIL PROTECTED]:0:0:class=0x06 card=0x348015d9 chip=0x254c8086
rev=0x01 hdr=0x00
vendor   = 'Intel Corporation'
device   = 'E7501 Host Controller'
class= bridge
subclass = HOST-PCI
[EMAIL PROTECTED]:0:1: class=0xff card=0x348015d9 chip=0x25418086 rev=0x01
hdr=0x00
vendor   = 'Intel Corporation'
device   = 'E7500 System Controller (MCH, Hub Interface A) Error
Reporter'
[EMAIL PROTECTED]:2:0: class=0x060400 card=0x chip=0x25438086 rev=0x01
hdr=0x01
vendor   = 'Intel Corporation'
device   = 'E7500/E7501 HI_B Virtual PCI-to-PCI Bridge'
class= bridge
subclass = PCI-PCI
[EMAIL PROTECTED]:3:0: class=0x060400 card=0x chip=0x25458086 rev=0x01
hdr=0x01
vendor   = 'Intel Corporation'
device   = 'E7500/E7501 HI_C Virtual PCI-to-PCI Bridge'
class= bridge
subclass = PCI-PCI
[EMAIL PROTECTED]:29:0:class=0x0c0300 card=0x348015d9 chip=0x24828086
rev=0x02 hdr=0x00
vendor   = 'Intel Corporation'
device   = '82801CA/CAM (ICH3-S/ICH3-M) USB Controller'
class= serial bus
subclass = USB
[EMAIL PROTECTED]:29:1:class=0x0c0300 card=0x348015d9 chip=0x24848086
rev=0x02 hdr=0x00
vendor   = 'Intel Corporation'
device   = '82801CA/CAM (ICH3-S/ICH3-M) USB Controller'
class= serial bus
subclass = USB
[EMAIL PROTECTED]:29:2:class=0x0c0300 card=0x348015d9 chip=0x24878086
rev=0x02 hdr=0x00
vendor   = 'Intel Corporation'
device   = '82801CA/CAM (ICH3-S/ICH3-M) USB Controller'
class= serial bus
subclass = USB
[EMAIL PROTECTED]:30:0:class=0x060400 card=0x chip=0x244e8086
rev=0x42 hdr=0x01
vendor   = 'Intel Corporation'
device   = '82801BA/CA/DB/DBL/EB/ER/FB (ICH2/3/4/4/5/5/6), 6300ESB
Hub Interface to PCI Bridge'
class= bridge
subclass = PCI-PCI
[EMAIL PROTECTED]:31:0:class=0x060100 card=0x chip=0x24808086
rev=0x02 hdr=0x00
vendor   = 'Intel Corporation'
device   = '82801CA/CAM (ICH3-S/ICH3-M) LPC Interface'
class= bridge
subclass = PCI-ISA
[EMAIL PROTECTED]:31:1:  

Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-28 Thread O. Hartmann

Scott Long wrote:

All,

Attached is my first cut at addressing the problems described in this 
thread.  As I discussed earlier, the VM syncer thread is likely starving
the USB interrupt thread.  This causes the shared usb+network 
interrupt to remain masked, preventing network interrupts from being 
delivered,

and thus triggering watchdog timeouts.

This patch only addresses the USB driver.  If your network card is 
sharing an interrupt with something other than (or additional to) USB,

this might not work for you.  Also, this patch is just a very rough
proof-of-concept and is not meant for production use.  But I'd like to
get feedback now before I spend more time on this.  If this works then
I'll clean it up and make it suitable for the release, and I'll look at
some of the other drivers like ichsmb.

If this is to be fixed for 6.2, I need lots of feedback ASAP.  So please
do not be shy =-)  The patch is at:

http://people.freebsd.org/~scottl/usb_fastintr.diff

Scott

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

patch does not work on my box:

cc -c -O2 -frename-registers -pipe -fno-strict-aliasing -march=athlon64 
-Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes  
-Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual  
-fformat-extensions -std=c99  -nostdinc -I-  -I. -I/usr/src/sys 
-I/usr/src/sys/contrib/altq -I/usr/src/sys/contrib/ipfilter 
-I/usr/src/sys/contrib/pf -I/usr/src/sys/contrib/dev/ath 
-I/usr/src/sys/contrib/dev/ath/freebsd -I/usr/src/sys/contrib/ngatm 
-I/usr/src/sys/dev/twa -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include 
opt_global.h -fno-common -finline-limit=8000 --param 
inline-unit-growth=100 --param large-function-growth=1000  
-mcmodel=kernel -mno-red-zone  -mfpmath=387 -mno-sse -mno-sse2 -mno-mmx 
-mno-3dnow  -msoft-float -fno-asynchronous-unwind-tables -ffreestanding 
-Werror  /usr/src/sys/dev/usb/usb.c

/usr/src/sys/dev/usb/usb.c: In function `usb_attach':
/usr/src/sys/dev/usb/usb.c:282: error: `usb_intr_task' undeclared (first 
use in this function)
/usr/src/sys/dev/usb/usb.c:282: error: (Each undeclared identifier is 
reported only once

/usr/src/sys/dev/usb/usb.c:282: error: for each function it appears in.)
/usr/src/sys/dev/usb/usb.c: At top level:
/usr/src/sys/dev/usb/usb.c:863: warning: 'usb_intr_task' defined but not 
used

*** Error code 1

Stop in /usr/obj/usr/src/sys/THOR.
*** Error code 1

Stop in /usr/src.
*** Error code 1

Stop in /usr/src.


Uname:

FreeBSD my.box.org 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #85: Thu Sep 28 
17:09:24 UTC 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/THOR  amd64

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-28 Thread Scott Long

O. Hartmann wrote:

Scott Long wrote:


All,

Attached is my first cut at addressing the problems described in this 
thread.  As I discussed earlier, the VM syncer thread is likely starving
the USB interrupt thread.  This causes the shared usb+network 
interrupt to remain masked, preventing network interrupts from being 
delivered,

and thus triggering watchdog timeouts.

This patch only addresses the USB driver.  If your network card is 
sharing an interrupt with something other than (or additional to) USB,

this might not work for you.  Also, this patch is just a very rough
proof-of-concept and is not meant for production use.  But I'd like to
get feedback now before I spend more time on this.  If this works then
I'll clean it up and make it suitable for the release, and I'll look at
some of the other drivers like ichsmb.

If this is to be fixed for 6.2, I need lots of feedback ASAP.  So please
do not be shy =-)  The patch is at:

http://people.freebsd.org/~scottl/usb_fastintr.diff

Scott

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


patch does not work on my box:

cc -c -O2 -frename-registers -pipe -fno-strict-aliasing -march=athlon64 
-Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes  
-Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual  
-fformat-extensions -std=c99  -nostdinc -I-  -I. -I/usr/src/sys 
-I/usr/src/sys/contrib/altq -I/usr/src/sys/contrib/ipfilter 
-I/usr/src/sys/contrib/pf -I/usr/src/sys/contrib/dev/ath 
-I/usr/src/sys/contrib/dev/ath/freebsd -I/usr/src/sys/contrib/ngatm 
-I/usr/src/sys/dev/twa -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include 
opt_global.h -fno-common -finline-limit=8000 --param 
inline-unit-growth=100 --param large-function-growth=1000  
-mcmodel=kernel -mno-red-zone  -mfpmath=387 -mno-sse -mno-sse2 -mno-mmx 
-mno-3dnow  -msoft-float -fno-asynchronous-unwind-tables -ffreestanding 
-Werror  /usr/src/sys/dev/usb/usb.c

/usr/src/sys/dev/usb/usb.c: In function `usb_attach':
/usr/src/sys/dev/usb/usb.c:282: error: `usb_intr_task' undeclared (first 
use in this function)
/usr/src/sys/dev/usb/usb.c:282: error: (Each undeclared identifier is 
reported only once

/usr/src/sys/dev/usb/usb.c:282: error: for each function it appears in.)
/usr/src/sys/dev/usb/usb.c: At top level:
/usr/src/sys/dev/usb/usb.c:863: warning: 'usb_intr_task' defined but not 
used

*** Error code 1

Stop in /usr/obj/usr/src/sys/THOR.
*** Error code 1

Stop in /usr/src.
*** Error code 1

Stop in /usr/src.



I accidentally posted a patch against HEAD, not RELENG_6.  I'll correct 
that shortly.


Scott
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-28 Thread Mike Tancsa

At 03:15 PM 9/28/2006, O. Hartmann wrote:


/usr/src/sys/dev/usb/usb.c:282: error: for each function it appears in.)
/usr/src/sys/dev/usb/usb.c: At top level:
/usr/src/sys/dev/usb/usb.c:863: warning: 'usb_intr_task' defined but not used
*** Error code 1



Are you sure the patch applied cleanly to STABLE ?  There are a 
couple of spots you need to change manually as it assumes the version 
of USB from HEAD.


Manually apply the patch for usb.c and ohci_pci.c if you are using 
STABLE and remove the offending bits from the patch and it should 
compile cleanly.


---Mike 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-28 Thread Alan Amesbury
Additional data point:  On 6.1-RELEASE I've observed the same sort of
behavior, but without any noticeable consistency.  It affects bge(4) and
em(4) systems.  In the case of the bge(4)-equipped system, there's a
very weak correlation between heavy disk activity and watchdog timeouts.
 However, on that system, it doesn't look like the network card shares
its PCI bus and interrupt with any other devices:

bgehost % pciconf -l
[EMAIL PROTECTED]:0:0:class=0x06 card=0x chip=0x00081166
rev=0x23 hdr=0x00
[EMAIL PROTECTED]:0:1:class=0x06 card=0x chip=0x00081166
rev=0x01 hdr=0x00
[EMAIL PROTECTED]:0:2:class=0x06 card=0x chip=0x00061166
rev=0x01 hdr=0x00
[EMAIL PROTECTED]:0:3:class=0x06 card=0x chip=0x00061166
rev=0x01 hdr=0x00
[EMAIL PROTECTED]:8:0:  class=0x01 card=0xe2a09005 chip=0x00809005 rev=0x02
hdr=0x00
[EMAIL PROTECTED]:14:0:class=0x03 card=0x00d11028 chip=0x47521002
rev=0x27 hdr=0x00
[EMAIL PROTECTED]:15:0:class=0x060100 card=0x02001166 chip=0x02001166
rev=0x50 hdr=0x00
[EMAIL PROTECTED]:15:1:  class=0x01018a card=0x chip=0x0266
rev=0x00 hdr=0x00
[EMAIL PROTECTED]:15:2:class=0x0c0310 card=0x02201166 chip=0x02201166
rev=0x04 hdr=0x00
[EMAIL PROTECTED]:8:0:  class=0x02 card=0x00d11028 chip=0x164414e4 rev=0x12
hdr=0x00
[EMAIL PROTECTED]:2:0: class=0x060400 card=0x0068 chip=0x09628086 rev=0x01
hdr=0x01
[EMAIL PROTECTED]:2:1:  class=0x010400 card=0x00d11028 chip=0x00021028 rev=0x01
hdr=0x00
[EMAIL PROTECTED]:4:0:  class=0x02 card=0x009b1028 chip=0x12298086 rev=0x08
hdr=0x00
bgehost % grep irq /var/run/dmesg.boot
ioapic0 Version 1.1 irqs 0-15 on motherboard
ioapic1 Version 1.1 irqs 16-31 on motherboard
ahc0: Adaptec 29160 Ultra160 SCSI adapter port 0xec00-0xecff mem
0xfe102000-0xfe102fff irq 18 at device 8.0 on pci0
ohci0: OHCI (generic) USB controller mem 0xfe10-0xfe100fff irq 5
at device 15.2 on pci0
bge0: Broadcom BCM5700 Gigabit Ethernet, ASIC rev. 0x7102 mem
0xfeb0-0xfeb0 irq 17 at device 8.0 on pci1
aac0: Dell PERC 3/Di mem 0xf000-0xf7ff irq 31 at device 2.1 on
pci2
fxp0: Intel 82559 Pro/100 Ethernet port 0xccc0-0xccff mem
0xfe90-0xfe900fff,0xfe70-0xfe7f irq 16 at device 4.0 on pci2
fdc0: floppy drive controller port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0
atkbdc0: Keyboard controller (i8042) port 0x60,0x64 irq 1 on acpi0
atkbd0: AT Keyboard irq 1 on atkbdc0
psm0: PS/2 Mouse irq 12 on atkbdc0
sio0: 16550A-compatible COM port port 0x3f8-0x3ff irq 4 flags 0x10 on
acpi0
sio1: 16550A-compatible COM port port 0x2f8-0x2ff irq 3 on acpi0


This is an SMP host (a pair of Pentium IIIs).

The em(4)-equipped host emits watchdog timeout warnings far more
frequently, but not with any discernable pattern.  However, it routinely
handles a *lot* more network traffic, and that traffic is unpredictable
and bursty in nature.  Its interfaces also appear to have their own
resources allocated:

emhost %pciconf -l
[EMAIL PROTECTED]:0:0:class=0x06 card=0x chip=0x25788086
rev=0x02 hdr=0x00
[EMAIL PROTECTED]:3:0: class=0x060400 card=0x chip=0x257b8086 rev=0x02
hdr=0x01
[EMAIL PROTECTED]:28:0:class=0x060400 card=0x0050 chip=0x25ae8086
rev=0x02 hdr=0x01
[EMAIL PROTECTED]:29:0:class=0x0c0300 card=0x01651028 chip=0x25a98086
rev=0x02 hdr=0x00
[EMAIL PROTECTED]:29:1:class=0x0c0300 card=0x01651028 chip=0x25aa8086
rev=0x02 hdr=0x00
[EMAIL PROTECTED]:29:4:class=0x088000 card=0x01651028 chip=0x25ab8086
rev=0x02 hdr=0x00
[EMAIL PROTECTED]:29:5:class=0x080020 card=0x01651028 chip=0x25ac8086
rev=0x02 hdr=0x00
[EMAIL PROTECTED]:29:7:class=0x0c0320 card=0x01651028 chip=0x25ad8086
rev=0x02 hdr=0x00
[EMAIL PROTECTED]:30:0:class=0x060400 card=0x chip=0x244e8086
rev=0x0a hdr=0x01
[EMAIL PROTECTED]:31:0:class=0x060100 card=0x chip=0x25a18086
rev=0x02 hdr=0x00
[EMAIL PROTECTED]:31:2:  class=0x01018a card=0x01651028 chip=0x25a38086
rev=0x02 hdr=0x00
[EMAIL PROTECTED]:31:3:class=0x0c0500 card=0x01651028 chip=0x25a48086
rev=0x02 hdr=0x00
[EMAIL PROTECTED]:1:0:   class=0x02 card=0x01651028 chip=0x10758086 rev=0x00
hdr=0x00
[EMAIL PROTECTED]:1:0:   class=0x02 card=0x10128086 chip=0x10108086 rev=0x01
hdr=0x00
[EMAIL PROTECTED]:1:1:   class=0x02 card=0x10128086 chip=0x10108086 rev=0x01
hdr=0x00
[EMAIL PROTECTED]:2:0:   class=0x02 card=0x01651028 chip=0x10768086 rev=0x00
hdr=0x00
[EMAIL PROTECTED]:3:0:  class=0x010400 card=0x05201028 chip=0x19601000 rev=0x01
hdr=0x00
[EMAIL PROTECTED]:14:0:class=0x03 card=0x01651028 chip=0x47521002
rev=0x27 hdr=0x00
emhost %grep irq /var/run/dmesg.boot
ioapic0 Version 2.0 irqs 0-23 on motherboard
ioapic1 Version 2.0 irqs 24-47 on motherboard
em0: Intel(R) PRO/1000 Network Connection Version - 3.2.18 port
0xece0-0xecff mem 0xfe3e-0xfe3f irq 18 at device 1.0 on pci1
em1: Intel(R) PRO/1000 Network 

Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-28 Thread Scott Long

Mike Tancsa wrote:

At 03:15 PM 9/28/2006, O. Hartmann wrote:


/usr/src/sys/dev/usb/usb.c:282: error: for each function it appears in.)
/usr/src/sys/dev/usb/usb.c: At top level:
/usr/src/sys/dev/usb/usb.c:863: warning: 'usb_intr_task' defined but 
not used

*** Error code 1




Are you sure the patch applied cleanly to STABLE ?  There are a couple 
of spots you need to change manually as it assumes the version of USB 
from HEAD.


Manually apply the patch for usb.c and ohci_pci.c if you are using 
STABLE and remove the offending bits from the patch and it should 
compile cleanly.


---Mike


Corrected patch is at:

http://people.freebsd.org/~scottl/usb_fastintr_RELENG_6.diff

Sorry for the confusion.

Scott

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-28 Thread Mike Jakubik

Scott Long wrote:

All,

Attached is my first cut at addressing the problems described in this 
thread.  As I discussed earlier, the VM syncer thread is likely starving
the USB interrupt thread.  This causes the shared usb+network 
interrupt to remain masked, preventing network interrupts from being 
delivered,

and thus triggering watchdog timeouts.


Just to be clear, has it been established that the problem only occurs 
when em is sharing an interrupt? I have a lot of production machines 
using the PDSMi board, which is one of the boards that the problem was 
noticed on, however i do not share any irqs, i always disable USB in the 
BIOS.


# vmstat -i
interrupt  total   rate
irq16: em0  13001181  7
irq19: atapci0  76559511 42
cpu0: timer   3643365617   1999
cpu1: timer   3643365610   1999
Total 7376291919   4048


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-28 Thread Patrick M. Hausen
Hi!

On Thu, Sep 28, 2006 at 02:47:09PM -0500, Alan Amesbury wrote:
 Additional data point:  On 6.1-RELEASE I've observed the same sort of
 behavior, but without any noticeable consistency.  It affects bge(4) and
 em(4) systems.  In the case of the bge(4)-equipped system, there's a
 very weak correlation between heavy disk activity and watchdog timeouts.
  However, on that system, it doesn't look like the network card shares
 its PCI bus and interrupt with any other devices:

Same here, just to make sure to get that point through:

em doesn't share an interrupt with anything else
- hang will occur sooner or later if the system is busy
(sometimes later, but reproducably)

force system to share interrupt of, say, ata0 and em0
- immediate *kaboom* whenever both are busy

HTH,
Patrick
-- 
punkt.de GmbH Internet - Dienstleistungen - Beratung
Vorholzstr. 25Tel. 0721 9109 -0 Fax: -100
76137 Karlsruhe   http://punkt.de
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-28 Thread Philippe Pegon

Mike Jakubik wrote:

Scott Long wrote:

All,

Attached is my first cut at addressing the problems described in this 
thread.  As I discussed earlier, the VM syncer thread is likely starving
the USB interrupt thread.  This causes the shared usb+network 
interrupt to remain masked, preventing network interrupts from being 
delivered,

and thus triggering watchdog timeouts.


Just to be clear, has it been established that the problem only occurs 
when em is sharing an interrupt? I have a lot of production machines 
using the PDSMi board, which is one of the boards that the problem was 
noticed on, however i do not share any irqs, i always disable USB in the 
BIOS.


On many of our servers, we have bge cards and I can see a lot of 
watchdog timeouts. We always disable USB in the bios and they didn't 
share irq.




# vmstat -i
interrupt  total   rate
irq16: em0  13001181  7
irq19: atapci0  76559511 42
cpu0: timer   3643365617   1999
cpu1: timer   3643365610   1999
Total 7376291919   4048


example with our ftp server (ftp8.fr.freebsd.org), a HP DL360 G4 SMP :

# vmstat -i
interrupt  total   rate
irq1: atkbd01576  0
irq4: sio0 3  0
irq6: fdc012  0
irq14: ata0   57  0
irq24: ciss117181184  8
irq25: bge0841821262402
irq26: bge1674342644322
irq72: ciss024194679 11
cpu0: timer   4180478365   1999
cpu1: timer   4180886439   1999
Total 9918906221   4743

# bzgrep watchdog /var/log/messages*
/var/log/messages:Sep 23 02:47:06 anubis kernel: bge1: watchdog timeout 
-- resetting
/var/log/messages.0.bz2:Sep 12 22:22:48 anubis kernel: bge1: watchdog 
timeout -- resetting
/var/log/messages.0.bz2:Sep 17 15:22:01 anubis kernel: bge1: watchdog 
timeout -- resetting
/var/log/messages.0.bz2:Sep 20 12:13:07 anubis kernel: bge1: watchdog 
timeout -- resetting
/var/log/messages.1.bz2:Sep  6 08:33:54 anubis kernel: bge1: watchdog 
timeout -- resetting
/var/log/messages.3.bz2:Aug 29 12:09:36 anubis kernel: bge0: watchdog 
timeout -- resetting
/var/log/messages.4.bz2:Aug 22 15:44:00 anubis kernel: bge0: watchdog 
timeout -- resetting


# pciconf -lv
[EMAIL PROTECTED]:0:0:class=0x06 card=0x32000e11 chip=0x35908086 
rev=0x0a hdr=0x00

vendor   = 'Intel Corporation'
device   = 'E752x Server Memory Controller Hub'
class= bridge
subclass = HOST-PCI
[EMAIL PROTECTED]:2:0: class=0x060400 card=0x0050 chip=0x35958086 rev=0x0a 
hdr=0x01

vendor   = 'Intel Corporation'
device   = 'E752x Memory Controller Hub PCI Express Port A0'
class= bridge
subclass = PCI-PCI
[EMAIL PROTECTED]:4:0: class=0x060400 card=0x0050 chip=0x35978086 rev=0x0a 
hdr=0x01

vendor   = 'Intel Corporation'
device   = 'E752x Memory Controller Hub PCI Express Port B0'
class= bridge
subclass = PCI-PCI
[EMAIL PROTECTED]:6:0: class=0x060400 card=0x0050 chip=0x35998086 rev=0x0a 
hdr=0x01

vendor   = 'Intel Corporation'
device   = 'E752x Memory Controller Hub PCI Express Port C0'
class= bridge
subclass = PCI-PCI
[EMAIL PROTECTED]:28:0:class=0x060400 card=0x0050 chip=0x25ae8086 
rev=0x02 hdr=0x01

vendor   = 'Intel Corporation'
device   = '6300ESB Hub Interface to PCI-X Bridge'
class= bridge
subclass = PCI-PCI
[EMAIL PROTECTED]:30:0:class=0x060400 card=0x chip=0x244e8086 
rev=0x0a hdr=0x01

vendor   = 'Intel Corporation'
device   = '82801BA/CA/DB/DBL/EB/ER/FB (ICH2/3/4/4/5/5/6), 6300ESB 
Hub Interface to PCI Bridge'

class= bridge
subclass = PCI-PCI
[EMAIL PROTECTED]:31:0:class=0x060100 card=0x chip=0x25a18086 
rev=0x02 hdr=0x00

vendor   = 'Intel Corporation'
device   = '6300ESB LPC Interface Bridge'
class= bridge
subclass = PCI-ISA
[EMAIL PROTECTED]:31:1:  class=0x01018a card=0x32010e11 chip=0x25a28086 
rev=0x02 hdr=0x00

vendor   = 'Intel Corporation'
device   = '6300ESB IDE Controller'
class= mass storage
subclass = ATA
[EMAIL PROTECTED]:0:0: class=0x060400 card=0x0044 chip=0x03298086 rev=0x09 
hdr=0x01

vendor   = 'Intel Corporation'
device   = '6700PXH PCI Express-to-PCI Express Bridge A'
class= bridge
subclass = PCI-PCI
[EMAIL PROTECTED]:0:2: class=0x060400 card=0x0044 chip=0x032a8086 rev=0x09 
hdr=0x01

vendor   = 'Intel Corporation'
device   = '6700PXH PCI Express-to-PCI Express Bridge B'
class= bridge
subclass = PCI-PCI
[EMAIL PROTECTED]:1:0:class=0x010400 card=0x409b0e11 chip=0x00460e11 
rev=0x01 hdr=0x00


Re: CALL FOR TESTERS! [Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2]

2006-09-28 Thread Pete French
 On many of our servers, we have bge cards and I can see a lot of 
 watchdog timeouts. We always disable USB in the bios and they didn't 
 share irq.

I see the same thing - we have a number of HP blades which use bge interfaces
and I get many watchdog timeouts on them. These are also not sharing any
interrupts

interrupt  total   rate
irq1: atkbd0   2  0
irq24: ciss0   13208 11
irq74: bge1   1452046216120
cpu0: timer   2581779930214
cpu2: timer   2579262777214
cpu1: timer   2581771929214
cpu3: timer   2579262777214
Total11909678839989

This is 6.1 - I have a couple of boxes running 6.2 and those have not
shown any timeouts so far. They are, however, far more lightly loaded.

-pete.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Oliver Brandmueller
Hi,

On Wed, Sep 27, 2006 at 08:00:21AM +0200, Martin Nilsson wrote:
 I get tons of these:
 em0: watchdog timeout -- resetting
 em0: link state changed to DOWN
 em0: link state changed to UP
 
 mailbox# pciconf -lv
 [EMAIL PROTECTED]:0:0:  class=0x02 card=0x108c15d9 chip=0x108c8086 
 rev=0x03 
 hdr=0x00
 vendor   = 'Intel Corporation'
 device   = 'PRO/1000 PM'
 class= network
 subclass = ethernet
 [EMAIL PROTECTED]:0:0:  class=0x02 card=0x109a15d9 chip=0x109a8086 
 rev=0x00 
 hdr=0x00
 vendor   = 'Intel Corporation'
 class= network
 subclass = ethernet
 
[...]
 I have only seen them on em0. Yesterday I tried sysutils/cpuburn on 
 similar boxes that are netbooted with NFS mounted drives and everytime I 
 loaded the two CPU cores the network went down.

I see the same.

Very much on this one, where I workaround the problem by using polling,
it's a UP machine.

FreeBSD nessie 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #3: Fri Sep 15 09:48:36 
CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/NESSIE  i386

[EMAIL PROTECTED]:1:0:   class=0x02 card=0x10198086 chip=0x10198086 
rev=0x00 hdr=0x00
vendor   = 'Intel Corporation'
device   = '82547EI Gigabit Ethernet Controller (LOM)'
class= network
subclass = ethernet

irq18: em0 uhci23319  0


Another machine, also UP, but with two interfaces. The problem is not as 
apparent as on the first machine, but it's there. This machine is not as 
loaded usually (CPU wise) as the first machine. The problem is ONLY on 
em1:

FreeBSD hudson 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #48: Thu Sep 14 10:19:46 
CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/NFS-32-FBSD6  i386

[EMAIL PROTECTED]:1:0:   class=0x02 card=0x10758086 chip=0x10758086 
rev=0x00 hdr=0x00
vendor   = 'Intel Corporation'
device   = '82547EI Gigabit Ethernet Controller'
class= network
subclass = ethernet

[EMAIL PROTECTED]:2:0:   class=0x02 card=0x10768086 chip=0x10768086 
rev=0x00 hdr=0x00
vendor   = 'Intel Corporation'
device   = '82547EI Gigabit Ethernet Controller'
class= network
subclass = ethernet

irq17: em1 ichsmb0 950121879855
irq18: em0  71437344 64


The problem appeared after the em updates during the last weeks in the
kernel and has not been observed before this. em is always loaded as a 
module in my kernels. The problem seems to occur more often if the 
machine's CPU is busy.


I have several SMP machines with the following em interfaces, which 
DON'T show the problem, but they also have different chipset on the em 
interface. Most of the kernels were built between Sep 7 and Sep 19.

3 times this:
[EMAIL PROTECTED]:5:0:   class=0x02 card=0x34248086 chip=0x10108086 
rev=0x01 hdr=0x00
[EMAIL PROTECTED]:5:1:   class=0x02 card=0x34248086 chip=0x10108086 
rev=0x01 hdr=0x00
irq23: em0 970303432750



3 times this:
[EMAIL PROTECTED]:5:0:   class=0x02 card=0x34258086 chip=0x100e8086 
rev=0x02 hdr=0x00
irq23: em0 292477376435


So I can observe at least 3 interesting differences:

- the interface showing the problems shares the interrupt
- for me it happens on UP machines only
- the chips are different

What I can't do: moving the interfaces between machines, these are 
 onboard interfaces.

What I could do: I could try to unload the USB driver or the ichsmb 
driver on the machiens, where the interrupts are shared. Anyway, the USB 
is not used currently (I have it enabled to be prepared to hook up a USB 
Mass Storage device, which never happend since the problem occured). The 
ichsmb also is usually not queried.

Any suggestions on how I could help?

- Olli


-- 
| Oliver Brandmueller | Offenbacher Str. 1  | Germany   D-14197 Berlin |
| Fon +49-172-3130856 | Fax +49-172-3145027 | WWW:   http://the.addict.de/ |
|   Ich bin das Internet. Sowahr ich Gott helfe.   |
| Eine gewerbliche Nutzung aller enthaltenen Adressen ist nicht gestattet! |


pgpo9EsOWtG7V.pgp
Description: PGP signature


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Scott Long

Oliver Brandmueller wrote:


Hi,

On Wed, Sep 27, 2006 at 08:00:21AM +0200, Martin Nilsson wrote:


I get tons of these:
em0: watchdog timeout -- resetting
em0: link state changed to DOWN
em0: link state changed to UP

mailbox# pciconf -lv
[EMAIL PROTECTED]:0:0:  class=0x02 card=0x108c15d9 chip=0x108c8086 rev=0x03 
hdr=0x00

   vendor   = 'Intel Corporation'
   device   = 'PRO/1000 PM'
   class= network
   subclass = ethernet
[EMAIL PROTECTED]:0:0:  class=0x02 card=0x109a15d9 chip=0x109a8086 rev=0x00 
hdr=0x00

   vendor   = 'Intel Corporation'
   class= network
   subclass = ethernet



[...]

I have only seen them on em0. Yesterday I tried sysutils/cpuburn on 
similar boxes that are netbooted with NFS mounted drives and everytime I 
loaded the two CPU cores the network went down.



I see the same.

Very much on this one, where I workaround the problem by using polling,
it's a UP machine.

FreeBSD nessie 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #3: Fri Sep 15 09:48:36 
CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/NESSIE  i386

[EMAIL PROTECTED]:1:0:   class=0x02 card=0x10198086 chip=0x10198086 
rev=0x00 hdr=0x00
vendor   = 'Intel Corporation'
device   = '82547EI Gigabit Ethernet Controller (LOM)'
class= network
subclass = ethernet

irq18: em0 uhci23319  0


Another machine, also UP, but with two interfaces. The problem is not as 
apparent as on the first machine, but it's there. This machine is not as 
loaded usually (CPU wise) as the first machine. The problem is ONLY on 
em1:


FreeBSD hudson 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #48: Thu Sep 14 10:19:46 
CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/NFS-32-FBSD6  i386

[EMAIL PROTECTED]:1:0:   class=0x02 card=0x10758086 chip=0x10758086 
rev=0x00 hdr=0x00
vendor   = 'Intel Corporation'
device   = '82547EI Gigabit Ethernet Controller'
class= network
subclass = ethernet

[EMAIL PROTECTED]:2:0:   class=0x02 card=0x10768086 chip=0x10768086 
rev=0x00 hdr=0x00
vendor   = 'Intel Corporation'
device   = '82547EI Gigabit Ethernet Controller'
class= network
subclass = ethernet

irq17: em1 ichsmb0 950121879855
irq18: em0  71437344 64


The problem appeared after the em updates during the last weeks in the
kernel and has not been observed before this. em is always loaded as a 
module in my kernels. The problem seems to occur more often if the 
machine's CPU is busy.



I have several SMP machines with the following em interfaces, which 
DON'T show the problem, but they also have different chipset on the em 
interface. Most of the kernels were built between Sep 7 and Sep 19.


3 times this:
[EMAIL PROTECTED]:5:0:   class=0x02 card=0x34248086 chip=0x10108086 
rev=0x01 hdr=0x00
[EMAIL PROTECTED]:5:1:   class=0x02 card=0x34248086 chip=0x10108086 
rev=0x01 hdr=0x00
irq23: em0 970303432750



3 times this:
[EMAIL PROTECTED]:5:0:   class=0x02 card=0x34258086 chip=0x100e8086 
rev=0x02 hdr=0x00
irq23: em0 292477376435


So I can observe at least 3 interesting differences:

- the interface showing the problems shares the interrupt
- for me it happens on UP machines only
- the chips are different

What I can't do: moving the interfaces between machines, these are 
 onboard interfaces.


What I could do: I could try to unload the USB driver or the ichsmb 
driver on the machiens, where the interrupts are shared. Anyway, the USB 
is not used currently (I have it enabled to be prepared to hook up a USB 
Mass Storage device, which never happend since the problem occured). The 
ichsmb also is usually not queried.


Any suggestions on how I could help?

- Olli




Well, the best I can say at the moment is, Wow.  =-(  I guess the 
thing to do here is to figure out if the problem lies with the em 
interrupt handler not getting run, or the taskqueue not getting run.

Since you've stated that it seems to be related to shared interrupts,
the first possibility is more likely.  However, I'm not sure why the
symptom would only be showing up now.  The Intel docs say that the
82547EI are a bit interesting, and I wonder if assumptions that we
make about PCI ordering aren't true (or if there are bugs that make
our assumptions invalid).

Does this happen after there has been a lot of disk activity, like a
large tar extraction?  Are you using the SMBus interface at all, or is
it sitting completely idle?

Scott

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Patrick M. Hausen
Hello!

 Well, the best I can say at the moment is, Wow.  =-(  I guess the 
 thing to do here is to figure out if the problem lies with the em 
 interrupt handler not getting run, or the taskqueue not getting run.

I helped Pyun with some debugging by providing ssh access to
a machine showing the (seemingly) same problem.

At first he thought the interrupt handler of the em driver was
the culprit, but we applied quite a few patches and tested
afterwards - seems like the driver is not the cause.

On -stable occasionally other people complained about very similar
looking problems with bge and other drivers. My guess is, though 
I'm not a kernel developer, just an experienced admin, that
em stands out as problematic just by coincidence. Certain onboard
network components tend to come with certaiin chipsets and certain
architectures.

So, Pyun suggested it was a problem with the taskqueue that was
introduced some time between 6.0 and 6.1.

With my system (Tyan GT20 B5161G20) the problem shows when there
is heavy disk and cpu activity, like make buildworld.
I made sure that the em interface doesn't share an interrupt
with the SATA controller. When the problem occurs, I get the
well known watchdog timeout messages and then the system's
network activity over that interface freezes completely for
a couple of minutes.
Usually the system recovers after a while without reboot or
other measures.

What I can do: give ssh access to a system showing this behaviour
including a network connection to another box, so one can transfer
large amounts of data over a private LAN. I used FTP of a sparse
big file.

Prerequisite: fixed IP address of the machine that the developer
whishes to use to connect to my system.

HTH,
Patrick
-- 
punkt.de GmbH Internet - Dienstleistungen - Beratung
Vorholzstr. 25Tel. 0721 9109 -0 Fax: -100
76137 Karlsruhe   http://punkt.de
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Ulrich Spoerlein

On 9/27/06, Martin Nilsson [EMAIL PROTECTED] wrote:


mailbox# uname -a
FreeBSD mailbox 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #0: Fri Sep 22
00:31:29 CEST 2006
[EMAIL PROTECTED]:/usr/obj-local/usr/src/sys/SMP  amd64

I get tons of these:
em0: watchdog timeout -- resetting
em0: link state changed to DOWN
em0: link state changed to UP

mailbox# pciconf -lv
[EMAIL PROTECTED]:0:0:  class=0x02 card=0x108c15d9 chip=0x108c8086 rev=0x03
hdr=0x00
 vendor   = 'Intel Corporation'
 device   = 'PRO/1000 PM'
 class= network
 subclass = ethernet
[EMAIL PROTECTED]:0:0:  class=0x02 card=0x109a15d9 chip=0x109a8086 rev=0x00
hdr=0x00
 vendor   = 'Intel Corporation'
 class= network
 subclass = ethernet

em0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
 options=bRXCSUM,TXCSUM,VLAN_MTU
 inet6 fe80::230:48ff:fe89:c958%em0 prefixlen 64 scopeid 0x1
 inet 192.168.10.2 netmask 0xff00 broadcast 192.168.10.255
 ether 00:30:48:89:c9:58
 media: Ethernet autoselect (1000baseTX full-duplex)
 status: active



We have several SMP systems with onboard em0/em1 Interfaces running on a
RELENG_6 snapshot taken at 2006-09-20 00:00+0. They are not in production
yet, so the load is not that much. However I haven't seen any watchdog
timeouts on them. Only annoyance is, that the em(4) interfaces take too long
for the interface to come up, ie, the system will boot, run ifconfig, the
interface still has no link so syslogd/ntpdate/ntpd will complain about 'no
route to host'. A 'sleep 5' fixes that problem, though I'd like to avoid
such hacks.

Anyway, here's the data:

[EMAIL PROTECTED]:2:0:   class=0x02 card=0x117a8086 chip=0x10798086 rev=0x03
hdr=0x00
   vendor   = 'Intel Corporation'
   device   = '82546EB Dual Port Gigabit Ethernet Controller'
   class= network
   subclass = ethernet
[EMAIL PROTECTED]:2:1:   class=0x02 card=0x117a8086 chip=0x10798086 rev=0x03
hdr=0x00
   vendor   = 'Intel Corporation'
   device   = '82546EB Dual Port Gigabit Ethernet Controller'
   class= network
   subclass = ethernet

em0: Intel(R) PRO/1000 Network Connection Version - 6.1.4 port
0x3040-0x307f mem 0xd832-0xd833 irq 54 at device 2.0 on pci3
em0: Ethernet address: XX
em0: [FAST]
em1: Intel(R) PRO/1000 Network Connection Version - 6.1.4 port
0x3080-0x30bf mem 0xd834-0xd835 irq 55 at device 2.1 on pci3
em1: Ethernet address: XX
em1: [FAST]
em0: link state changed to UP

em0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
   options=bRXCSUM,TXCSUM,VLAN_MTU
   inet 1.2.3.4 netmask 0xff00 broadcast 1.2.3.4
   ether X
   media: Ethernet autoselect (100baseTX full-duplex)
   status: active

Hope this helps to narrow down the problem.
Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Patrick M. Hausen
Hello!

 On -stable occasionally other people complained about very similar
 looking problems with bge and other drivers. My guess is, though 
 I'm not a kernel developer, just an experienced admin, that
 em stands out as problematic just by coincidence. Certain onboard
 network components tend to come with certaiin chipsets and certain
 architectures.

I forgot to mention: we do have systems with em interfaces that
never showed this problem!

Regards,
Patrick
-- 
punkt.de GmbH Internet - Dienstleistungen - Beratung
Vorholzstr. 25Tel. 0721 9109 -0 Fax: -100
76137 Karlsruhe   http://punkt.de
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread glz
I have seen the watchdog and reset problem on a -STABLE laptop, both em 
and iwi. It only occur when I try to connect using Mulberry e-mail 
client so I thought it could be a problem with the linuxilator.


The load on the box is normally low but both driver have shared 
interrupts, either with cbb or usb. Here is what I can see:


uname -a:
FreeBSD viglaf 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #55: Thu Sep 21 
22:15:38 CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/VIGLAF  i386


dmesg:
em0: Intel(R) PRO/1000 Network Connection Version - 6.1.4 port 
0x8000-0x803f mem 0xc022-0xc023,0xc020-0xc020 irq 11 at 
device 1.0 on pci2

em0: Ethernet address: 00:0d:60:89:36:e8
em0: [FAST]
iwi0: Intel(R) PRO/Wireless 2915ABG mem 0xc0214000-0xc0214fff irq 9 at 
device 2.0 on pci2

iwi0: Ethernet address: 00:16:6f:8b:0a:21

vmstat -i
interrupt  total   rate
irq0: clk   11148090999
irq1: atkbd0   32271  2
irq5: pcm0 atapci+157115 14
irq6: fdc0 1  0
irq7:  1  0
stray irq7 1  0
irq8: rtc1426745127
irq9: cbb1 cbb2++* 26582  2
irq11: cbb0 em0++*762544 68
irq12: psm0   516858 46
irq14: ata043494  3
irq15: ata1   82  0
Total   14113784   1265

This is a development machine so I can debug and test patches as needed.

Best regards,
Goran L

Patrick M. Hausen wrote:

Hello!


On -stable occasionally other people complained about very similar
looking problems with bge and other drivers. My guess is, though 
I'm not a kernel developer, just an experienced admin, that

em stands out as problematic just by coincidence. Certain onboard
network components tend to come with certaiin chipsets and certain
architectures.


I forgot to mention: we do have systems with em interfaces that
never showed this problem!

Regards,
Patrick



--
... the future isMobile

 Goran Lowkrantz [EMAIL PROTECTED]
 System Architect, isMobile, Aurorum 2, S-977 75 Luleå, Sweden
 Phone: +46(0)920-75559
 Mobile: +46(0)70-587 87 82 Fax: +46(0)70-615 87 82

http://www.ismobile.com ...
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Gerrit Kühn
On Wed, 27 Sep 2006 13:24:15 +0200 glz [EMAIL PROTECTED]
wrote about Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2:

G I have seen the watchdog and reset problem on a -STABLE laptop, both em 
G and iwi. It only occur when I try to connect using Mulberry e-mail 
G client so I thought it could be a problem with the linuxilator.

Same (or at least similar) behaviour here on an HP/Compaq nx7010 with an
internal rl interface. I can trigger the problems using cvsup (even at
moderate speeds connected via ADSL). Just drop me a note which further
infos are needed for debugging.


cu
  Gerrit
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Philippe Pegon

Hi,

it's just a me too. On our ftp server (ftp8.fr.freebsd.org), sometimes
we see some watchdog timeout in the log with a bge card, but maybe it's
not the same problem... :

/var/log/messages:Sep 23 02:47:06 anubis kernel: bge1: watchdog timeout -- 
resetting
/var/log/messages:Sep 23 02:47:06 anubis kernel: bge1: link state changed to 
DOWN
/var/log/messages:Sep 23 02:47:11 anubis kernel: bge1: link state changed to UP
/var/log/messages.0.bz2:Sep 12 22:22:48 anubis kernel: bge1: watchdog timeout 
-- resetting
/var/log/messages.0.bz2:Sep 12 22:22:48 anubis kernel: bge1: link state changed 
to DOWN
/var/log/messages.0.bz2:Sep 12 22:22:51 anubis kernel: bge1: link state changed 
to UP
/var/log/messages.0.bz2:Sep 17 15:22:01 anubis kernel: bge1: watchdog timeout 
-- resetting
/var/log/messages.0.bz2:Sep 17 15:22:01 anubis kernel: bge1: link state changed 
to DOWN
/var/log/messages.0.bz2:Sep 17 15:22:06 anubis kernel: bge1: link state changed 
to UP
/var/log/messages.0.bz2:Sep 20 12:13:07 anubis kernel: bge1: watchdog timeout 
-- resetting
/var/log/messages.0.bz2:Sep 20 12:13:07 anubis kernel: bge1: link state changed 
to DOWN
/var/log/messages.0.bz2:Sep 20 12:13:11 anubis kernel: bge1: link state changed 
to UP
/var/log/messages.1.bz2:Sep  6 08:33:54 anubis kernel: bge1: watchdog timeout 
-- resetting
/var/log/messages.1.bz2:Sep  6 08:33:54 anubis kernel: bge1: link state changed 
to DOWN
/var/log/messages.1.bz2:Sep  6 08:33:59 anubis kernel: bge1: link state changed 
to UP
/var/log/messages.2.bz2:Sep  4 17:39:25 anubis kernel: bge1: link state changed 
to DOWN
/var/log/messages.2.bz2:Sep  4 17:39:28 anubis kernel: bge1: link state changed 
to UP
/var/log/messages.3.bz2:Aug 29 12:09:36 anubis kernel: bge0: watchdog timeout 
-- resetting
/var/log/messages.3.bz2:Aug 29 12:09:36 anubis kernel: bge0: link state changed 
to DOWN
/var/log/messages.3.bz2:Aug 29 12:09:41 anubis kernel: bge0: link state changed 
to UP
/var/log/messages.4.bz2:Aug 22 15:44:00 anubis kernel: bge0: watchdog timeout 
-- resetting
/var/log/messages.4.bz2:Aug 22 15:44:00 anubis kernel: bge0: link state changed 
to DOWN
/var/log/messages.4.bz2:Aug 22 15:44:03 anubis kernel: bge0: link state changed 
to UP

--
Philippe Pegon

Patrick M. Hausen wrote:

Hello!

Well, the best I can say at the moment is, Wow.  =-(  I guess the 
thing to do here is to figure out if the problem lies with the em 
interrupt handler not getting run, or the taskqueue not getting run.


I helped Pyun with some debugging by providing ssh access to
a machine showing the (seemingly) same problem.

At first he thought the interrupt handler of the em driver was
the culprit, but we applied quite a few patches and tested
afterwards - seems like the driver is not the cause.

On -stable occasionally other people complained about very similar
looking problems with bge and other drivers. My guess is, though 
I'm not a kernel developer, just an experienced admin, that

em stands out as problematic just by coincidence. Certain onboard
network components tend to come with certaiin chipsets and certain
architectures.

So, Pyun suggested it was a problem with the taskqueue that was
introduced some time between 6.0 and 6.1.

With my system (Tyan GT20 B5161G20) the problem shows when there
is heavy disk and cpu activity, like make buildworld.
I made sure that the em interface doesn't share an interrupt
with the SATA controller. When the problem occurs, I get the
well known watchdog timeout messages and then the system's
network activity over that interface freezes completely for
a couple of minutes.
Usually the system recovers after a while without reboot or
other measures.

What I can do: give ssh access to a system showing this behaviour
including a network connection to another box, so one can transfer
large amounts of data over a private LAN. I used FTP of a sparse
big file.

Prerequisite: fixed IP address of the machine that the developer
whishes to use to connect to my system.

HTH,
Patrick


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Adrian Chadd

Me Too(tm).

FreeBSD jacinta.home.cacheboy.net 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #0:
Mon Sep 18 07:59:50 UTC 2006
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC
i386

Lots of this in dmesg:

em0: watchdog timeout -- resetting
em0: link state changed to DOWN
em0: link state changed to UP

vmstat -i:
irq16: em01053995830   2844

According to dmesg only em0 is on the bus. This is on an NForce2 board with
an AMD 1800XP+.


Adrian
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Patrick M. Hausen
Hi!

On Wed, Sep 27, 2006 at 02:42:30PM +0200, Philippe Pegon wrote:

 it's just a me too. On our ftp server (ftp8.fr.freebsd.org), sometimes
 we see some watchdog timeout in the log with a bge card, but maybe it's
 not the same problem... :

As far as I know the watchdog timeouts are _supposed_ to be
mostly harmless, i.e. recoverable.

Some people experience additional complete hangs of network
communications, that may or may not be related to them.

Regards,
Patrick
-- 
punkt.de GmbH Internet - Dienstleistungen - Beratung
Vorholzstr. 25Tel. 0721 9109 -0 Fax: -100
76137 Karlsruhe   http://punkt.de
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Adrian Chadd

s/is on the bus/is alone on the irq/.

(And it shows up when I'm running polygraph and apachebench tests.)


On 9/27/06, Adrian Chadd [EMAIL PROTECTED] wrote:


Me Too(tm).

FreeBSD jacinta.home.cacheboy.net 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE#0: Mon 
Sep 18 07:59:50 UTC 2006
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC  i386

Lots of this in dmesg:

em0: watchdog timeout -- resetting
em0: link state changed to DOWN
em0: link state changed to UP

vmstat -i:
irq16: em01053995830   2844

According to dmesg only em0 is on the bus. This is on an NForce2 board
with an AMD 1800XP+.


Adrian





--
Adrian Chadd - [EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Jeremy Chadwick
On Wed, Sep 27, 2006 at 03:25:55PM +0200, Patrick M. Hausen wrote:
 On Wed, Sep 27, 2006 at 02:42:30PM +0200, Philippe Pegon wrote:
  it's just a me too. On our ftp server (ftp8.fr.freebsd.org), sometimes
  we see some watchdog timeout in the log with a bge card, but maybe it's
  not the same problem... :
 
 As far as I know the watchdog timeouts are _supposed_ to be
 mostly harmless, i.e. recoverable.

You'll still see impact -- that is, no packets flowing.  The
reason things are recoverable is solely because of the retry
functionality for layer 2 packets...

In general, it's not a good thing to have watchdog timeouts.
It means the interrupt is hung, or the card is hung.

-- 
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networkinghttp://www.parodius.com/ |
| UNIX Systems Administrator   Mountain View, CA, USA |
| Making life hard for others since 1977.   PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Jeremy Chadwick
On Wed, Sep 27, 2006 at 09:06:09PM +0800, Adrian Chadd wrote:
 Me Too(tm).

Me three -- and the interesting part (in my case) is that em0
shares an IRQ with the ATA controller.

http://www.freebsd.org/cgi/query-pr.cgi?pr=103435

Because people are reporting this on more than just the em driver
(bge driver as well), my guess is that it's not specific to the
Ethernet drivers.

I've seen some semi-recent commits pertaining to the APIC handling
code -- could these explain what's happening?

-- 
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networkinghttp://www.parodius.com/ |
| UNIX Systems Administrator   Mountain View, CA, USA |
| Making life hard for others since 1977.   PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Mike Tancsa

At 09:25 AM 9/27/2006, Patrick M. Hausen wrote:

Hi!

On Wed, Sep 27, 2006 at 02:42:30PM +0200, Philippe Pegon wrote:

 it's just a me too. On our ftp server (ftp8.fr.freebsd.org), sometimes
 we see some watchdog timeout in the log with a bge card, but maybe it's
 not the same problem... :

As far as I know the watchdog timeouts are _supposed_ to be
mostly harmless, i.e. recoverable.



If it up / downs the interface, it can be painful depending on your 
setup. In one of the colos I dont have control over, the switch port 
will block for 15 seconds for Spanning Tree when the interface 
transitions like that.   Even in cases where this does not happen, a 
1-2 second network outage can play havoc with some applications.


---Mike 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Patrick M. Hausen
Hi!

On Wed, Sep 27, 2006 at 06:52:51AM -0700, Jeremy Chadwick wrote:
 On Wed, Sep 27, 2006 at 03:25:55PM +0200, Patrick M. Hausen wrote:
  On Wed, Sep 27, 2006 at 02:42:30PM +0200, Philippe Pegon wrote:
   it's just a me too. On our ftp server (ftp8.fr.freebsd.org), sometimes
   we see some watchdog timeout in the log with a bge card, but maybe it's
   not the same problem... :
  
  As far as I know the watchdog timeouts are _supposed_ to be
  mostly harmless, i.e. recoverable.
 
 You'll still see impact -- that is, no packets flowing.  The
 reason things are recoverable is solely because of the retry
 functionality for layer 2 packets...

You are, of course, right. What I meant is: these timeouts should
not lead to freezing of all network communications for a couple
of minutes like me and some other people seem to experience.

TCP and most UDP based upper level protocols will recover
gently from a lost packet or two.

Regards,
Patrick
-- 
punkt.de GmbH Internet - Dienstleistungen - Beratung
Vorholzstr. 25Tel. 0721 9109 -0 Fax: -100
76137 Karlsruhe   http://punkt.de
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Oliver Brandmueller

Hi.

On Wed, Sep 27, 2006 at 04:19:55PM +0200, Patrick M. Hausen wrote:
  You'll still see impact -- that is, no packets flowing.  The
  reason things are recoverable is solely because of the retry
  functionality for layer 2 packets...
 
 You are, of course, right. What I meant is: these timeouts should
 not lead to freezing of all network communications for a couple
 of minutes like me and some other people seem to experience.

port fast on a switchport is not in all cases a desirable option, apart 
from the fact that you probably don't have the acces and choice in some 
places to do so. Withtout this this means at least 10-20 seconds without 
network on some switches until the port is up again on theswitch after 
it went down!

- Oliver

-- 
| Oliver Brandmueller | Offenbacher Str. 1  | Germany   D-14197 Berlin |
| Fon +49-172-3130856 | Fax +49-172-3145027 | WWW:   http://the.addict.de/ |
|   Ich bin das Internet. Sowahr ich Gott helfe.   |
| Eine gewerbliche Nutzung aller enthaltenen Adressen ist nicht gestattet! |
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Oliver Brandmueller
Hi Scott,

On Wed, Sep 27, 2006 at 03:16:57AM -0600, Scott Long wrote:
 Well, the best I can say at the moment is, Wow.  =-(  I guess the 
 thing to do here is to figure out if the problem lies with the em 
 interrupt handler not getting run, or the taskqueue not getting run.
 Since you've stated that it seems to be related to shared interrupts,
 the first possibility is more likely.  However, I'm not sure why the
 symptom would only be showing up now.  The Intel docs say that the
 82547EI are a bit interesting, and I wonder if assumptions that we
 make about PCI ordering aren't true (or if there are bugs that make
 our assumptions invalid).
 
 Does this happen after there has been a lot of disk activity, like a
 large tar extraction?  Are you using the SMBus interface at all, or is
 it sitting completely idle?

Disk activity does not trigger the problem, I hammered the disk with 
around 85 MB/s (dd) for about half an hour without seeing any effect. A 
CPU bound thing like a buildworld triggered the problem.

The SMBus Interface is not used at all (it's not even really usable). 
Anyway, as soon as I unload the ichsmb module I cannot triger the 
problem anymore. If I load it again, the problem cann again be triggered 
by a buildworld. Statistical relevance: I did 4 buildworlds, alternating 
the load/unload of ichsmb - both times with ichsmb loaded I saw 3 
watchdog timeouts during the buildworld was running, while ichsmb was 
not loaded I did not see a single watchdog timeout. The use of the 
interface was around the same during all the time (constant NFS traffic 
of around 1-2 MBit/s).

Since we all seem to see this on only the interfaces sharing interrupts 
(as I read the other poster's mails) and the problem can be worked 
around by using polling, it seems to become pretty clear, that it has to 
to with interrupt handling.

The UP/SMP idea seems to be only of interest, because on an UP machine 
it's more likely to share interrupts than on SMP machines, it has 
nothing to do with the fact of UP or SMP itself.

- Oliver


-- 
| Oliver Brandmueller | Offenbacher Str. 1  | Germany   D-14197 Berlin |
| Fon +49-172-3130856 | Fax +49-172-3145027 | WWW:   http://the.addict.de/ |
|   Ich bin das Internet. Sowahr ich Gott helfe.   |
| Eine gewerbliche Nutzung aller enthaltenen Adressen ist nicht gestattet! |


pgpjEqLJq9Fh8.pgp
Description: PGP signature


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Jeremy Chadwick
On Wed, Sep 27, 2006 at 09:56:22AM -0400, Mike Tancsa wrote:
 If it up / downs the interface, it can be painful depending on your 
 setup. In one of the colos I dont have control over, the switch port 
 will block for 15 seconds for Spanning Tree when the interface 
 transitions like that.   Even in cases where this does not happen, a 
 1-2 second network outage can play havoc with some applications.

Ouch!  This is one of many reasons people don't use STP.  (I did note
the colos I don't have control over part -- frustrating eh?)

-- 
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networkinghttp://www.parodius.com/ |
| UNIX Systems Administrator   Mountain View, CA, USA |
| Making life hard for others since 1977.   PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Brooks Davis
On Wed, Sep 27, 2006 at 05:28:24PM +0200, Oliver Brandmueller wrote:
 Hi Scott,
 
 On Wed, Sep 27, 2006 at 03:16:57AM -0600, Scott Long wrote:
  Well, the best I can say at the moment is, Wow.  =-(  I guess the 
  thing to do here is to figure out if the problem lies with the em 
  interrupt handler not getting run, or the taskqueue not getting run.
  Since you've stated that it seems to be related to shared interrupts,
  the first possibility is more likely.  However, I'm not sure why the
  symptom would only be showing up now.  The Intel docs say that the
  82547EI are a bit interesting, and I wonder if assumptions that we
  make about PCI ordering aren't true (or if there are bugs that make
  our assumptions invalid).
  
  Does this happen after there has been a lot of disk activity, like a
  large tar extraction?  Are you using the SMBus interface at all, or is
  it sitting completely idle?
 
 Disk activity does not trigger the problem, I hammered the disk with 
 around 85 MB/s (dd) for about half an hour without seeing any effect. A 
 CPU bound thing like a buildworld triggered the problem.

I'm not sure that's a valid test by it self.  As things go, dd is pretty
easy on the disk IO system especially with large buffer sizes.  I'd
suggest tar extraction or possible parallel tar extraction.  The goal is
to generate a large number of transations not large transactions.

-- Brooks



pgpeCNuKVMIZG.pgp
Description: PGP signature


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Jeremy Chadwick
On Wed, Sep 27, 2006 at 05:28:24PM +0200, Oliver Brandmueller wrote:
 Disk activity does not trigger the problem, I hammered the disk with 
 around 85 MB/s (dd) for about half an hour without seeing any effect. A 
 CPU bound thing like a buildworld triggered the problem.
 
 The SMBus Interface is not used at all (it's not even really usable). 
 Anyway, as soon as I unload the ichsmb module I cannot triger the 
 problem anymore. If I load it again, the problem cann again be triggered 
 by a buildworld. Statistical relevance: I did 4 buildworlds, alternating 
 the load/unload of ichsmb - both times with ichsmb loaded I saw 3 
 watchdog timeouts during the buildworld was running, while ichsmb was 
 not loaded I did not see a single watchdog timeout. The use of the 
 interface was around the same during all the time (constant NFS traffic 
 of around 1-2 MBit/s).

Interesting find.  For what it's worth -- I too load the appropriate
smbus drivers on the system with the em0 problem (loading smbus and
ichsmb).  That system is a single processor / single core system, with
HT disabled in the BIOS (which doesn't matter since FreeBSD disables
it anyways).  Kernel is non-SMP.  Only reason I mention this is:

 The UP/SMP idea seems to be only of interest, because on an UP machine 
 it's more likely to share interrupts than on SMP machines, it has 
 nothing to do with the fact of UP or SMP itself.

-- 
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networkinghttp://www.parodius.com/ |
| UNIX Systems Administrator   Mountain View, CA, USA |
| Making life hard for others since 1977.   PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Oliver Brandmueller
Hi,

On Wed, Sep 27, 2006 at 10:50:55AM -0500, Brooks Davis wrote:
  Disk activity does not trigger the problem, I hammered the disk with 
  around 85 MB/s (dd) for about half an hour without seeing any effect. A 
  CPU bound thing like a buildworld triggered the problem.
 
 I'm not sure that's a valid test by it self.  As things go, dd is pretty
 easy on the disk IO system especially with large buffer sizes.  I'd
 suggest tar extraction or possible parallel tar extraction.  The goal is
 to generate a large number of transations not large transactions.

The dd generated (accordings to gstat) around 600 tps by itself. Anyway,
at night, when the to-disk-backups from the other machines are coming
in, there are variuos large and small disk operations - and it never
happens in that case. On the other hand my other server, which does only 
few things on the disk, but has fewer CPU power and more CPU bound 
actions to do shows the behaviour very often (until I started to use 
polling). Disk activity might be a reason if the interrupt is shared 
with a disk controller, which is not the case for any of my affected 
machines.

- Oliver

-- 
| Oliver Brandmueller | Offenbacher Str. 1  | Germany   D-14197 Berlin |
| Fon +49-172-3130856 | Fax +49-172-3145027 | WWW:   http://the.addict.de/ |
|   Ich bin das Internet. Sowahr ich Gott helfe.   |
| Eine gewerbliche Nutzung aller enthaltenen Adressen ist nicht gestattet! |


pgpoX252pE4W2.pgp
Description: PGP signature


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Oliver Brandmueller
Hi,

On Wed, Sep 27, 2006 at 08:55:53AM -0700, Jeremy Chadwick wrote:
  The SMBus Interface is not used at all (it's not even really usable). 
  Anyway, as soon as I unload the ichsmb module I cannot triger the 
  problem anymore. If I load it again, the problem cann again be triggered 
  by a buildworld. Statistical relevance: I did 4 buildworlds, alternating 
  the load/unload of ichsmb - both times with ichsmb loaded I saw 3 
  watchdog timeouts during the buildworld was running, while ichsmb was 
  not loaded I did not see a single watchdog timeout. The use of the 
  interface was around the same during all the time (constant NFS traffic 
  of around 1-2 MBit/s).
 
 Interesting find.  For what it's worth -- I too load the appropriate
 smbus drivers on the system with the em0 problem (loading smbus and
 ichsmb).  That system is a single processor / single core system, with
 HT disabled in the BIOS (which doesn't matter since FreeBSD disables
 it anyways).  Kernel is non-SMP.  Only reason I mention this is:
 
  The UP/SMP idea seems to be only of interest, because on an UP machine 
  it's more likely to share interrupts than on SMP machines, it has 
  nothing to do with the fact of UP or SMP itself.

I don't think it has to especially with ichsmb here, but only with the 
fact, that ichsmb is for me exactly the thing that shares the interrupt 
with the em interface that shows the problems.

- Oliver

-- 
| Oliver Brandmueller | Offenbacher Str. 1  | Germany   D-14197 Berlin |
| Fon +49-172-3130856 | Fax +49-172-3145027 | WWW:   http://the.addict.de/ |
|   Ich bin das Internet. Sowahr ich Gott helfe.   |
| Eine gewerbliche Nutzung aller enthaltenen Adressen ist nicht gestattet! |


pgpJbOPp94Jsf.pgp
Description: PGP signature


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Patrick M. Hausen
Hi!

On Wed, Sep 27, 2006 at 05:59:04PM +0200, Oliver Brandmueller wrote:

 I don't think it has to especially with ichsmb here, but only with the 
 fact, that ichsmb is for me exactly the thing that shares the interrupt 
 with the em interface that shows the problems.

I can confirm that making em0 share an interrupt with the
SATA-controller on my box makes the problem much much more
apparent.

HTH,
Patrick
-- 
punkt.de GmbH Internet - Dienstleistungen - Beratung
Vorholzstr. 25Tel. 0721 9109 -0 Fax: -100
76137 Karlsruhe   http://punkt.de
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Scott Long

Oliver Brandmueller wrote:

Hi,

On Wed, Sep 27, 2006 at 08:55:53AM -0700, Jeremy Chadwick wrote:

The SMBus Interface is not used at all (it's not even really usable). 
Anyway, as soon as I unload the ichsmb module I cannot triger the 
problem anymore. If I load it again, the problem cann again be triggered 
by a buildworld. Statistical relevance: I did 4 buildworlds, alternating 
the load/unload of ichsmb - both times with ichsmb loaded I saw 3 
watchdog timeouts during the buildworld was running, while ichsmb was 
not loaded I did not see a single watchdog timeout. The use of the 
interface was around the same during all the time (constant NFS traffic 
of around 1-2 MBit/s).


Interesting find.  For what it's worth -- I too load the appropriate
smbus drivers on the system with the em0 problem (loading smbus and
ichsmb).  That system is a single processor / single core system, with
HT disabled in the BIOS (which doesn't matter since FreeBSD disables
it anyways).  Kernel is non-SMP.  Only reason I mention this is:


The UP/SMP idea seems to be only of interest, because on an UP machine 
it's more likely to share interrupts than on SMP machines, it has 
nothing to do with the fact of UP or SMP itself.



I don't think it has to especially with ichsmb here, but only with the 
fact, that ichsmb is for me exactly the thing that shares the interrupt 
with the em interface that shows the problems.


- Oliver



My theory here is that something in the kernel, likely VM/VFS, is
holding the Giant lock for an inordinate amount of time.  During this
time, an interrupt fires on the shared em/ichsmb interrupt.  The em
interrupt handler runs and schedules a task to handle the event.  Then
the system blocks the interrupt at the PIC and schedules the ichsmb
ithread.  However, as soon as this ithread tries to run, it gets blocked
on the Giant lock that is held elsewhere.  While it is blocked, the
interrupt stays masked at the PIC, blocking out both ichsmb and em
device interrupts.  Normally the PIC would get unmasked after the
ithread has run, but until the ithread unblocks, this cannot happen.
This goes on long enough that pending transactions on the em interface
trigger a timeout.

Assuming the this analysis is correct, there are a couple of questions.
First would be, why is the ithread being blocked for so long?  Is the
Giant lock actually being held continuously for that long, or is being
dropped and relocked often but the scheduler isn't giving the ithread a
chance to grab it and run?  Second is, why is this only being noticed
now?  Whether the em driver uses an INTR_FAST handler, like it does now,
or an ithread handler, like it used to in 6.1, doesn't affect the ichsmb
driver and its interaction with the Giant lock.  Maybe there isn't a
direct correlation here, and it's just a coincidence that something else
in the system changed at the same time as the driver changing.

I have a few ideas on tracking down the root cause, but they are pretty
pretty painful and slow.  The root cause does need to be found and
fixed, as it's either a very bad scheduler bug, or a very badly
misbehaving subsystem.  Both have implications for other possible
problems in FreeBSD.  Also, the usb driver has the same potential for
blocking as the ichsmb driver, as do other drivers.  But in the mean
time, something needs to be done for 6.2.  The options are:

1. Revert the em driver to its 6.1 form, ask people to test if the
problem persists.  If it doesn't, leave it at that for now.

2. Add INTR_FAST shims to the usb and ichsmb drivers so that neither
uses an ithread.  Without an ithread, no PIC masking will happen, and
these drivers can block all they want without interfering with the
em driver.  This is a bit of risky work, though, and may not be possible
if the devices don't support certain functionality.  Also, it doesn't
address the root problem.  But, getting more interrupt handlers away
from needing Giant is a good thing, even if this only a band-aid.

3. Spend the time tracking down and fixing the root problem for 6.2.
This is ideal, but it is also an unbounded problem.  Thus, it is
absolutely not conducive for having a timely and successful 6.2 release.

4. Do nothing for now and tell people to disable usb, ichsmb, etc, as
needed.  This, of course, is not a good option.

Option 1 is the quickest and likely most risk-free fix for the 6.2
release.  If someone could test doing a revert and report back, I would
appreciate it.  Any volunteers?

Scott

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Tomasz Pilat

Well, HTH - I don't have *any* problems with this configuration:

FreeBSD 6.2-PRERELEASE #6: Wed Sep 20 18:52:56 CEST 2006 [EMAIL 
PROTECTED]:/usr/obj/usr/src/sys/MAILSMP

CPU: Intel(R) Xeon(TM) CPU 2.80GHz (2793.20-MHz K8-class CPU)
  Origin = GenuineIntel  Id = 0xf48  Stepping = 8
  
Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
  Features2=0x649dSSE3,RSVD2,MON,DS_CPL,EST,CNTX-ID,CX16,b14
  AMD Features=0x20100800SYSCALL,NX,LM
  AMD Features2=0x1LAHF
  Cores per package: 2
  Logical CPUs per core: 2
real memory  = 9126805504 (8704 MB)
avail memory = 8302972928 (7918 MB)
ACPI APIC Table: DELL   PE BKC  

pci6: ACPI PCI bus on pcib6
em0: Intel(R) PRO/1000 Network Connection Version - 6.1.4 port 0xecc0-0xecff 
mem 0xfe6e-0xfe6f irq 64 at device 7.0 on pci6
em0: [FAST]
pci7: ACPI PCI bus on pcib7
em1: Intel(R) PRO/1000 Network Connection Version - 6.1.4 port 0xdcc0-0xdcff 
mem 0xfe4e-0xfe4f irq 65 at device 8.0 on pci7
em1: [FAST]

[EMAIL PROTECTED]:7:0:   class=0x02 card=0x016d1028 chip=0x10768086 
rev=0x05 hdr=0x00
vendor   = 'Intel Corporation'
device   = '82547EI Gigabit Ethernet Controller'
[EMAIL PROTECTED]:8:0:   class=0x02 card=0x016d1028 chip=0x10768086 
rev=0x05 hdr=0x00
vendor   = 'Intel Corporation'
device   = '82547EI Gigabit Ethernet Controller'

em0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
options=bRXCSUM,TXCSUM,VLAN_MTU
media: Ethernet autoselect (1000baseTX full-duplex)
(em1 is not used)

interrupt  total   rate
irq1: atkbd01139  0
irq6: fdc0 8  0
irq14: ata0   36  0
irq18: uhci221714980 37
irq23: ehci0   3  0
irq46: amr0 20493929 34
irq64: em0 106173807181
cpu0: timer   1172649960   1999

This   is   heave   duty   mail   server,   loaded   with   a  lot  of
postfix/amavis/courier processes..

I can provide my kernel/loader/sysctl configuration at request.


Ponc
-- 
Tomasz Pilat  http://poncki.freebsd.pl./
AXEL SPRINGER POLSKA Sp. z o.o.   PONC-RIPE | PGPKEY-EDEB47FC

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on e-mail/Usenet?

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Jeremy Chadwick
On Wed, Sep 27, 2006 at 06:32:59PM +0200, Patrick M. Hausen wrote:
 On Wed, Sep 27, 2006 at 05:59:04PM +0200, Oliver Brandmueller wrote:
  I don't think it has to especially with ichsmb here, but only with the 
  fact, that ichsmb is for me exactly the thing that shares the interrupt 
  with the em interface that shows the problems.
 
 I can confirm that making em0 share an interrupt with the
 SATA-controller on my box makes the problem much much more
 apparent.

So we're all on the same page here -- this really appears to be some
kind-of kernel interrupt handler problem (something somewhere is
getting deadlocked?  Not sure).

Has anyone tried rolling back to previous 6.2 builds to try and
figure out timeframes when this was introduced?  From my perspective,
it happened sometime between August and the end of September.

-- 
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networkinghttp://www.parodius.com/ |
| UNIX Systems Administrator   Mountain View, CA, USA |
| Making life hard for others since 1977.   PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Javier Henderson


On Sep 27, 2006, at 11:50 AM, Jeremy Chadwick wrote:


On Wed, Sep 27, 2006 at 09:56:22AM -0400, Mike Tancsa wrote:

If it up / downs the interface, it can be painful depending on your
setup. In one of the colos I dont have control over, the switch port
will block for 15 seconds for Spanning Tree when the interface
transitions like that.   Even in cases where this does not happen, a
1-2 second network outage can play havoc with some applications.


Ouch!  This is one of many reasons people don't use STP.  (I did note
the colos I don't have control over part -- frustrating eh?)


You could enable port fast and still have spanning tree in place.

What many reasons do you and others have to shun STP?

-jav


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Patrick M. Hausen
Hi, Scott!

On Wed, Sep 27, 2006 at 10:32:49AM -0600, Scott Long wrote:

 1. Revert the em driver to its 6.1 form, ask people to test if the
 problem persists.  If it doesn't, leave it at that for now.

For me the problem manifested itself some time between 6.0
and 6.1. I did the testing with Pyun with 6-STABLE up to
two weeks before 6.2-PRERELEASE.

Currently we do not dare upgrade typo3.org from 5.5 to 6.x
for precisely this problem. 5.5 is running fine for the
time being, no need to hurry for the latest and greatest, yet.

And the problem is not at all bound to shared interrupts.
I'll let you ssh in, if you like.

Kind regards,
Patrick
-- 
punkt.de GmbH Internet - Dienstleistungen - Beratung
Vorholzstr. 25Tel. 0721 9109 -0 Fax: -100
76137 Karlsruhe   http://punkt.de
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Jack Vogel

As an optional data point you might wish to consider the Intel
driver I am about to release, it has everything that 6.2 does
EXCEPT the interrupt changes. I kept those out because I
didn't want to break backward compatibility. If someone that
has repro'd this problem wants to check this speak up and
I'll send a tarball.

Jack


On 9/27/06, Scott Long [EMAIL PROTECTED] wrote:

Oliver Brandmueller wrote:
 Hi,

 On Wed, Sep 27, 2006 at 08:55:53AM -0700, Jeremy Chadwick wrote:

The SMBus Interface is not used at all (it's not even really usable).
Anyway, as soon as I unload the ichsmb module I cannot triger the
problem anymore. If I load it again, the problem cann again be triggered
by a buildworld. Statistical relevance: I did 4 buildworlds, alternating
the load/unload of ichsmb - both times with ichsmb loaded I saw 3
watchdog timeouts during the buildworld was running, while ichsmb was
not loaded I did not see a single watchdog timeout. The use of the
interface was around the same during all the time (constant NFS traffic
of around 1-2 MBit/s).

Interesting find.  For what it's worth -- I too load the appropriate
smbus drivers on the system with the em0 problem (loading smbus and
ichsmb).  That system is a single processor / single core system, with
HT disabled in the BIOS (which doesn't matter since FreeBSD disables
it anyways).  Kernel is non-SMP.  Only reason I mention this is:


The UP/SMP idea seems to be only of interest, because on an UP machine
it's more likely to share interrupts than on SMP machines, it has
nothing to do with the fact of UP or SMP itself.


 I don't think it has to especially with ichsmb here, but only with the
 fact, that ichsmb is for me exactly the thing that shares the interrupt
 with the em interface that shows the problems.

 - Oliver


My theory here is that something in the kernel, likely VM/VFS, is
holding the Giant lock for an inordinate amount of time.  During this
time, an interrupt fires on the shared em/ichsmb interrupt.  The em
interrupt handler runs and schedules a task to handle the event.  Then
the system blocks the interrupt at the PIC and schedules the ichsmb
ithread.  However, as soon as this ithread tries to run, it gets blocked
on the Giant lock that is held elsewhere.  While it is blocked, the
interrupt stays masked at the PIC, blocking out both ichsmb and em
device interrupts.  Normally the PIC would get unmasked after the
ithread has run, but until the ithread unblocks, this cannot happen.
This goes on long enough that pending transactions on the em interface
trigger a timeout.

Assuming the this analysis is correct, there are a couple of questions.
First would be, why is the ithread being blocked for so long?  Is the
Giant lock actually being held continuously for that long, or is being
dropped and relocked often but the scheduler isn't giving the ithread a
chance to grab it and run?  Second is, why is this only being noticed
now?  Whether the em driver uses an INTR_FAST handler, like it does now,
or an ithread handler, like it used to in 6.1, doesn't affect the ichsmb
driver and its interaction with the Giant lock.  Maybe there isn't a
direct correlation here, and it's just a coincidence that something else
in the system changed at the same time as the driver changing.

I have a few ideas on tracking down the root cause, but they are pretty
pretty painful and slow.  The root cause does need to be found and
fixed, as it's either a very bad scheduler bug, or a very badly
misbehaving subsystem.  Both have implications for other possible
problems in FreeBSD.  Also, the usb driver has the same potential for
blocking as the ichsmb driver, as do other drivers.  But in the mean
time, something needs to be done for 6.2.  The options are:

1. Revert the em driver to its 6.1 form, ask people to test if the
problem persists.  If it doesn't, leave it at that for now.

2. Add INTR_FAST shims to the usb and ichsmb drivers so that neither
uses an ithread.  Without an ithread, no PIC masking will happen, and
these drivers can block all they want without interfering with the
em driver.  This is a bit of risky work, though, and may not be possible
if the devices don't support certain functionality.  Also, it doesn't
address the root problem.  But, getting more interrupt handlers away
from needing Giant is a good thing, even if this only a band-aid.

3. Spend the time tracking down and fixing the root problem for 6.2.
This is ideal, but it is also an unbounded problem.  Thus, it is
absolutely not conducive for having a timely and successful 6.2 release.

4. Do nothing for now and tell people to disable usb, ichsmb, etc, as
needed.  This, of course, is not a good option.

Option 1 is the quickest and likely most risk-free fix for the 6.2
release.  If someone could test doing a revert and report back, I would
appreciate it.  Any volunteers?

Scott

___
freebsd-stable@freebsd.org mailing list

Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Jeremy Chadwick
On Wed, Sep 27, 2006 at 12:44:04PM -0400, Javier Henderson wrote:
 You could enable port fast and still have spanning tree in place.
 
 What many reasons do you and others have to shun STP?

Rather than ramble off all the things I've experienced with STP,
most of them are covered in this caveat document written by none
other than Cisco:

http://www.cisco.com/warp/public/473/16.html

portfast is mentioned, but I'll remind you that not everyone uses
Cisco equipment (nor should they).  I consider portfast admission
that STP wasn't such a great idea after all.

opinion
My logic is as follows: a properly managed network should never
encounter layer 1 loops.  STP is most commonly used for oh crap, I
made a mistake situations.  Humans aren't perfect, but if you've
engineers who continue to make physical segment loops over and over,
you're better off getting different engineers rather than deploying
STP and making a mess of network fail-over reliability.
/opinion

Regardless, this is totally off-topic for the list.  I'll be more
than happy to discuss all of this privately.  :-)

-- 
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networkinghttp://www.parodius.com/ |
| UNIX Systems Administrator   Mountain View, CA, USA |
| Making life hard for others since 1977.   PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Mike Tancsa

At 12:32 PM 9/27/2006, Scott Long wrote:


My theory here is that something in the kernel, likely VM/VFS, is
holding the Giant lock for an inordinate amount of time.  During this
time, an interrupt fires on the shared em/ichsmb interrupt.  The em


Hi Scott,
Do you think this issue is something particular to Intel 
based chipsets, and specifically NICs that share their interrupt with 
ichsmb or the USB subsystem ?  I have not gone through all the 
threads, but I dont recall people with say, AMD based boards running 
into this issue.


---Mike 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Jonathan Chen
On Wed, Sep 27, 2006 at 03:25:55PM +0200, Patrick M. Hausen wrote:
 Hi!
 
 On Wed, Sep 27, 2006 at 02:42:30PM +0200, Philippe Pegon wrote:
 
  it's just a me too. On our ftp server (ftp8.fr.freebsd.org), sometimes
  we see some watchdog timeout in the log with a bge card, but maybe it's
  not the same problem... :
 
 As far as I know the watchdog timeouts are _supposed_ to be
 mostly harmless, i.e. recoverable.
 
 Some people experience additional complete hangs of network
 communications, that may or may not be related to them.

I had watchdog timeouts occur on a small network setup, for a:

ssh remote cd /usr  tar xf - ports | tar xvf -

and this resulted in a pretty sparse ports tree on the local drive.
Lots of stuff being dropped. Shifting a single big tar-ball worked
though.
-- 
Jonathan Chen [EMAIL PROTECTED]
--
If everything's under control, you're going too slow
  - Mario Andretti
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Brooks Davis
On Thu, Sep 28, 2006 at 06:32:16AM +1200, Jonathan Chen wrote:
 On Wed, Sep 27, 2006 at 03:25:55PM +0200, Patrick M. Hausen wrote:
  Hi!
  
  On Wed, Sep 27, 2006 at 02:42:30PM +0200, Philippe Pegon wrote:
  
   it's just a me too. On our ftp server (ftp8.fr.freebsd.org), sometimes
   we see some watchdog timeout in the log with a bge card, but maybe it's
   not the same problem... :
  
  As far as I know the watchdog timeouts are _supposed_ to be
  mostly harmless, i.e. recoverable.
  
  Some people experience additional complete hangs of network
  communications, that may or may not be related to them.
 
 I had watchdog timeouts occur on a small network setup, for a:
 
   ssh remote cd /usr  tar xf - ports | tar xvf -
 
 and this resulted in a pretty sparse ports tree on the local drive.
 Lots of stuff being dropped. Shifting a single big tar-ball worked
 though.

I'm highly skeptical of this claim.  It's possible the connection failed
part way through and thus you didn't get all your files, but you
wouldn't get random dropouts.  TCP doesn't work that way.

-- Brooks


pgp8zFZ1luFKR.pgp
Description: PGP signature


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Peter Jeremy
On Wed, 2006-Sep-27 10:32:49 -0600, Scott Long wrote:
My theory here is that something in the kernel, likely VM/VFS, is
holding the Giant lock for an inordinate amount of time.

In the past (RELENG_5) I've had major problems with syncer delaying
interrupt threads for long periods (I've seen 8msec).  See
http://lists.freebsd.org/pipermail/freebsd-stable/2005-February/012346.html
I'm not sure if this is still a problem (but I am still having some
problems which may be caused by excessive interrupt and will be doing
some debugging as I get time).

I have a few ideas on tracking down the root cause, but they are pretty
pretty painful and slow.

In my case, I was fairly certain that the problem I was seeing was
excessive interrupt latency for my driver.  The approach I took was to
capture TSC, IRQ number and curproc address in lapic_handle_intr(),
atpic_handle_intr() and at the beginning of my interrupt handler into
a ring buffer.  I'd dump the ring buffer into a file using a userland
tool and then post-process the file looking for oddities.  In my case,
there was a _very_ high correlation between long latencies and syncer.
If anyone's interested in this approach, I can provide the relevant
code diffs.

2. Add INTR_FAST shims to the usb and ichsmb drivers so that neither
uses an ithread.

The problem I ran into with this approach was that my interrupt
handler needs to use psignal(9) - which requires PROC_LOCK() which
(AFAIK) isn't allowed in an INTR_FAST handler.

It would be useful if our interrupt subsystem allowed both INTR_FAST
and normal interrupt handlers to be defined.  If an INTR_FAST handler
is defined then it gets executed and defines whether its associated
interrupt thread handler needs to be triggered.  If there's no
INTR_FAST handler then the interrupt thread is always triggered.

-- 
Peter Jeremy


pgpXrDVFGe4sP.pgp
Description: PGP signature


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Scott Long

Peter Jeremy wrote:

On Wed, 2006-Sep-27 10:32:49 -0600, Scott Long wrote:


My theory here is that something in the kernel, likely VM/VFS, is
holding the Giant lock for an inordinate amount of time.



In the past (RELENG_5) I've had major problems with syncer delaying
interrupt threads for long periods (I've seen 8msec).  See
http://lists.freebsd.org/pipermail/freebsd-stable/2005-February/012346.html
I'm not sure if this is still a problem (but I am still having some
problems which may be caused by excessive interrupt and will be doing
some debugging as I get time).



I have a few ideas on tracking down the root cause, but they are pretty
pretty painful and slow.



In my case, I was fairly certain that the problem I was seeing was
excessive interrupt latency for my driver.  The approach I took was to
capture TSC, IRQ number and curproc address in lapic_handle_intr(),
atpic_handle_intr() and at the beginning of my interrupt handler into
a ring buffer.  I'd dump the ring buffer into a file using a userland
tool and then post-process the file looking for oddities.  In my case,
there was a _very_ high correlation between long latencies and syncer.
If anyone's interested in this approach, I can provide the relevant
code diffs.



Yes, I was thinking about the syncer too, but the timeouts for ethernet
interfaces are measured in seconds, not milliseconds.




2. Add INTR_FAST shims to the usb and ichsmb drivers so that neither
uses an ithread.



The problem I ran into with this approach was that my interrupt
handler needs to use psignal(9) - which requires PROC_LOCK() which
(AFAIK) isn't allowed in an INTR_FAST handler.


You can define a very simple INTR_FAST handler that just disables the
interrupt at the device and then schedules a taskqueue to do the real
work.  This is what I did for if_em, actually.



It would be useful if our interrupt subsystem allowed both INTR_FAST
and normal interrupt handlers to be defined.  If an INTR_FAST handler
is defined then it gets executed and defines whether its associated
interrupt thread handler needs to be triggered.  If there's no
INTR_FAST handler then the interrupt thread is always triggered.



This was an SoC2006 project, and I believe it will be committed fairly soon.

Scott
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread David G Lawrence
 In the past (RELENG_5) I've had major problems with syncer delaying
 interrupt threads for long periods (I've seen 8msec).  See
 http://lists.freebsd.org/pipermail/freebsd-stable/2005-February/012346.html
 I'm not sure if this is still a problem (but I am still having some
 problems which may be caused by excessive interrupt and will be doing
 some debugging as I get time).
...
 tool and then post-process the file looking for oddities.  In my case,
 there was a _very_ high correlation between long latencies and syncer.
 If anyone's interested in this approach, I can provide the relevant
 code diffs.

   I've seen this problem as well - results in around 9-10ms of occasional
scheduling delay for a real-time streaming application that I'm developing.
Shutting off softupdates on all of the mounted filesystems helps.
   Note that the watchdog timeout for the network drivers is usually 8000ms
(8 seconds), so this is unlikely to be related to that problem.

-DG

David G. Lawrence
President
Download Technologies, Inc. - http://www.downloadtech.com - (866) 399 8500
The FreeBSD Project - http://www.freebsd.org
Pave the road of life with opportunities.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Volker
On 37378-12-23 20:59, Patrick M. Hausen wrote:
 Hello!
 
 Well, the best I can say at the moment is, Wow.  =-(  I guess the 
 thing to do here is to figure out if the problem lies with the em 
 interrupt handler not getting run, or the taskqueue not getting run.
 
 I helped Pyun with some debugging by providing ssh access to
 a machine showing the (seemingly) same problem.
 
 At first he thought the interrupt handler of the em driver was
 the culprit, but we applied quite a few patches and tested
 afterwards - seems like the driver is not the cause.
 
 On -stable occasionally other people complained about very similar
 looking problems with bge and other drivers. My guess is, though 
 I'm not a kernel developer, just an experienced admin, that
 em stands out as problematic just by coincidence. Certain onboard
 network components tend to come with certaiin chipsets and certain
 architectures.
 
 So, Pyun suggested it was a problem with the taskqueue that was
 introduced some time between 6.0 and 6.1.
 
 With my system (Tyan GT20 B5161G20) the problem shows when there
 is heavy disk and cpu activity, like make buildworld.
 I made sure that the em interface doesn't share an interrupt
 with the SATA controller. When the problem occurs, I get the
 well known watchdog timeout messages and then the system's
 network activity over that interface freezes completely for
 a couple of minutes.
 Usually the system recovers after a while without reboot or
 other measures.
 

Strange... I've seen exactly that on a (recent) RELENG_6 box but
using a dirty old USB 1.1 NIC (aue). I've seen DOWN and UP messages
(mostly while rebuilding kernel + world + ports) on the console all
the time (but did not care about).

The machine in question is an Athlon XP-64 Socket 939, Asus A8N-VM
CSM. The USB ethernet NIC is a low budget ADMtek device. My
observations are probably not related to your issues but maybe a
sign of not really being a driver issue or not GigE related.

Greeting,

Volker
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Scott Long

David G Lawrence wrote:

In the past (RELENG_5) I've had major problems with syncer delaying
interrupt threads for long periods (I've seen 8msec).  See
http://lists.freebsd.org/pipermail/freebsd-stable/2005-February/012346.html
I'm not sure if this is still a problem (but I am still having some
problems which may be caused by excessive interrupt and will be doing
some debugging as I get time).


...


tool and then post-process the file looking for oddities.  In my case,
there was a _very_ high correlation between long latencies and syncer.
If anyone's interested in this approach, I can provide the relevant
code diffs.



   I've seen this problem as well - results in around 9-10ms of occasional
scheduling delay for a real-time streaming application that I'm developing.
Shutting off softupdates on all of the mounted filesystems helps.
   Note that the watchdog timeout for the network drivers is usually 8000ms
(8 seconds), so this is unlikely to be related to that problem.



Well, I kinda danced around the issue before, but I'll say it now.  I,
as well as a few others, have seen instances of Giant being held by the
syncer for 5 or more seconds at a time.  I can't explain why, and I've
never been able to catch it in the act in a meaningful way.  But it is
known to happen.  My best wild guess is that the syncer is doing a lot
of work (there is no question here), and keeps on getting preempted, and
as part of this, it blocks without locks being dropped.  Actually, this
is most likely exactly what is going on.  The syncer is sending out I/O
and is getting interrupted+preempted by the sata controller+driver, and
it winds up making very slow progress, while never actually releasing
Giant.

An easy way to test this would be to turn off preemption.  Could someone
with this problem remove the 'option PREEMPTION' line in their kernel
config and recompile/retest?  If this is in fact the root cause, then it
indeed has nothing to do with em driver INTR_FAST changes.  The easiest
fix then becomes the ichsmb and usb driver shims that I talked about.
The longer term fix is to continue progress on making the syncer run
without Giant and also not do so much work.  I think that there should
also be some discussion on the locking consequences of preemption.

Scott
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


  1   2   >