Another em0 watchdog timeout

2007-05-01 Thread Michael Collette

I realize there is a previous thread discussing this, but my symptoms
seem to be a little bit different.  Here's the stats...

FreeBSD 6.2-STABLE #1: Fri Apr 27 17:28:22 PDT 2007

[EMAIL PROTECTED]:0:0:  class=0x02 card=0x108c15d9 chip=0x108c8086 rev=0x03 
hdr=0x00
   vendor = 'Intel Corporation'
   device = 'PRO/1000 PM'
   class  = network
   subclass   = ethernet
[EMAIL PROTECTED]:0:0:  class=0x02 card=0x109a15d9 chip=0x109a8086 rev=0x00 
hdr=0x00
   vendor = 'Intel Corporation'
   class  = network
   subclass   = ethernet

em0: Intel(R) PRO/1000 Network Connection Version - 6.2.9 port
0x5000-0x501f mem 0xea30-0xea31 irq 16 at device 0.0 on pci13
em0: Ethernet address: 00:30:48:5c:cc:84
em1: Intel(R) PRO/1000 Network Connection Version - 6.2.9 port
0x6000-0x601f mem 0xea40-0xea41 irq 17 at device 0.0 on pci14
em1: Ethernet address: 00:30:48:5c:cc:85

I'm seeing the following entries in my messages log pop up about 2-4
times a day...

May 1 08:29:38 alpha kernel: em0: watchdog timeout -- resetting
May 1 08:29:38 alpha kernel: em0: link state changed to DOWN
May 1 08:29:41 alpha kernel: em0: link state changed to UP

I've gone and added the DEVICE_POLLING option in the kernel, but this
doesn't seem to help.  The problem only seems to happen during the
hours that my users would be hitting this box, so it really gets
noticed when those 3 seconds go by.  And yes, it's almost always a 3
second drop on the interface.

Is there anything I can do to prevent this from happening?  I saw
mention of a firmware update I might try, but haven't been able to
locate the file in question.

Thanks,
--
When you come to a fork in the roadTake it
- Yogi Berra
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: NFS Locking Issue

2006-07-03 Thread Michael Collette

Garance A Drosihn wrote:

At 9:13 PM -0400 7/1/06, Francisco Reyes wrote:

John Hay writes:


I only started to see the lockd problems when upgrading
the server side to FreeBSD 6.x and later. I had various
FreeBSD clients, between 4.x and 7-current and the lockd
problem only showed up when upgrading the server from
5.x to 6.x.


It confirms the same we are experiencing.. constant
freezing/locking issues.  I guess no more 6.X for us.. for
the foreseable future..


I don't know if this will be of any help to anyone,
but...

I recently moved a network-based service from a 4.x machine
to a 6.x machine.  Despite some testing in advance of the
switch, many people had problems with the service.  I booted
to a somewhat out-of-date snapshot of 5.x on the same box.
I still had problems, but it didn't seem as bad, so I stuck
with the 5.x system.  Some problems turned out to be bugs
in the service itself, and were eventually found and fixed.

However, one set of problems on that out-of-date snapshot
of 5.x were solved by adding:

net.inet.tcp.rfc1323=0

to /etc/sysctl.conf.  The guy who suggested that said it
avoided a bug which was fixed in later versions of either
5.x or 6.x, I forget which.  Of interest is that the bug
was such that some people connecting to the service were
never bothered by the bug, while other people could not use
the service at all until I turned off tcp.rfc1323 .

I have a test version of the same service running on a
different FreeBSD/i386 box, and that box is now updated
to freebsd-stable as of June 10th.  Lo and behold, someone
connecting to that test box reported some problems.  So I
typed in 'sysctl net.inet.tcp.rfc1323=0', and his problem
immediately disappeared.  So, it might be that there is
still some problem with the rfc1323 processing, or that the
bug which had been fixed has somehow been re-introduced.

In any case, people who are experiencing problems with NFS
might want to try that, and see if it makes any difference.
It does strike me as odd that some people are having a *lot*
of trouble with NFS under 6.x, while others seem to be okay
with it.  Perhaps the difference is the network topology
between the NFS server and the NFS clients.

Obviously, this is nothing but a guess on my part.  I am
not a networking guru!



Thanks for the try Garance, but in my setup it didn't make any 
difference.  I'll get into a bit more detail about my setup in another post.


Later on,
--
Michael Collette
IT Manager
TestEquity Inc
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: NFS Locking Issue

2006-07-03 Thread Michael Collette
, since re-configuring it, it hasn't exhibited the problem 
... if most of us get our machines configured properly to give useful 
information to the developers to debug this, the faster it will get 
fixed ...


My experience with most of the developers is that if you can get into 
DDB and give them 'internal traces' of the code, bugs tend to get fixed 
very quickly ... vmstat/ps give external views, more summaries then 
anything ... its the details under the hood that they need ... its not 
much different then your auto-mechanic ... try telling him there is a 
'knocking under the hood, please tell me how to fix it, but you can't 
have my car', and he'll brush you off ... give him 30 minutes under the 
hood, and not only will he have identified it, but he'll probably fix it 
too ...


Marc, the car is starting but won't move at all.  I don't know if this 
is the transmission, the steering wheel, or the radio.  I am feeling 
pretty certain that this car should never have left the lot in this 
condition though.


Again, these are problems that have been around for a while...
http://www.freebsd.org/cgi/query-pr.cgi?pr=84953
http://www.freebsd.org/cgi/query-pr.cgi?pr=80389

Later on,
--
Michael Collette
IT Manager
TestEquity Inc
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: NFS Locking Issue

2006-06-30 Thread Michael Collette

Michel Talon wrote:
I guess I'm still just a bit stunned that a bug this obvious not only 
found it's way into the STABLE branch, but is still there.  Maybe it's 
not as obvious as I think, or not many folks are using it?  All I know 
for sure here is that if I had upgraded to 6.1 my network would have 
been crippled.


Strange, since i upgraded to FreeBSD-6.1 and the NFS server to Fedora Core 5,
my machine, NFS client is happy, and lockd works. It is first time since
years i have no problem. It certainly did not work with FreeBSD-5 and i still
have a machine with FreeBSD-6.0 which does not work properly (frequently loses
the NFS mount, but it gets remounted some times later by amd). Anyways i have
exactly 0 problem with the 6.1 machine. I could extend that to say that
everything works very well on that machine, nothing is slow, including disk
access. This has not always been the case. Stability wise, i have not seen any
panic, hang or whatever since i have compiled a kernel adapted to my hardware.
I got a panic with the generic kernel soon after installation, but now
machine is totally stable.


Based on prior reading about this problem, I'd venture to guess that the 
file locking between FC5 and FreeBSD simply isn't.  See, between just 2 
machines sharing files without rpc.lockd running you won't see a 
problem.  Both the client and the server must not only be running 
rpc.lockd, but they must be able to actually talk to each other.


For a simple 2 machine setup, you don't really need much in the way of 
locking control, as you don't have to deal with multiple requests for 
the same resource.  This is why folks just running the -L flag on 
their mount command also aren't having any problems.


To actually see the problem isn't too hard to set up.  Just have 
rpc.lockd, rpc.statd, and rpcbind enabled on both the client and the 
server.  Then just starting trying to transfer a stack of files from one 
to the other.  I found this to be true even trying to go from a 5.4 
server to my 6.1 laptop here.


There was quite a thread on this back in March of this year, along with 
a few PR's that are still opened up.  I'm personally just coming head 
long into all of this.


Later on,
--
Michael Collette
IT Manager
TestEquity Inc
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


NFS Locking Issue

2006-06-29 Thread Michael Collette
This last week I had been working on a test network to test out 6.1 
prior to upgrading our production boxes from 5.4.  That's when I ran 
across the rpc.lockd issues that have been discussed earlier.


Our production setup has diskless clients running KDE, which due to this 
bug is now dead on 6.1.  I also have my mail server delivering messages 
to a file server via NFS.  I even have servers booting diskless with NFS 
provided file systems... all of which are dead on 6.1.


The last discussion our bug updates I've seen on this issue were about 3 
months ago.  This leaves me with a number of questions I hope can be 
answered here on this list.


Is NFS a big deal for most other users, or am I out here on the fringe 
using it as much as I do?


Is anyone working on a fix for this?  If so, is there any kind of time 
frame where this fix might be MFC'd to 6-STABLE?


I guess I'm still just a bit stunned that a bug this obvious not only 
found it's way into the STABLE branch, but is still there.  Maybe it's 
not as obvious as I think, or not many folks are using it?  All I know 
for sure here is that if I had upgraded to 6.1 my network would have 
been crippled.


Later on,
--
Michael Collette
IT Manager
TestEquity LLC
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: NFS Locking Issue

2006-06-29 Thread Michael Collette

Rong-en Fan wrote:

On 6/29/06, Michael Collette [EMAIL PROTECTED] wrote:

This last week I had been working on a test network to test out 6.1
prior to upgrading our production boxes from 5.4.  That's when I ran
across the rpc.lockd issues that have been discussed earlier.

Our production setup has diskless clients running KDE, which due to this
bug is now dead on 6.1.  I also have my mail server delivering messages
to a file server via NFS.  I even have servers booting diskless with NFS
provided file systems... all of which are dead on 6.1.

The last discussion our bug updates I've seen on this issue were about 3
months ago.  This leaves me with a number of questions I hope can be
answered here on this list.

Is NFS a big deal for most other users, or am I out here on the fringe
using it as much as I do?

Is anyone working on a fix for this?  If so, is there any kind of time
frame where this fix might be MFC'd to 6-STABLE?

I guess I'm still just a bit stunned that a bug this obvious not only
found it's way into the STABLE branch, but is still there.  Maybe it's
not as obvious as I think, or not many folks are using it?  All I know
for sure here is that if I had upgraded to 6.1 my network would have
been crippled.


Try 6.1-STABLE, especially make sure you have

$FreeBSD: src/usr.sbin/rpc.lockd/kern.c,v 1.16.2.1 2006/06/02
01:20:58 rodrigc Exp $

for usr.sbin/rpc.lockd/kern.c, and see if this helps.


I am running STABLE on all my test boxes, and the problem is very much 
there.  It's not everything that locks up though.  I'm able to bring X 
up with twm, but unable to launch any Gnome or KDE applications without 
them being stranded in a lock state.


I sure would have loved for your suggestion to be correct.  For what 
it's worth, all the boxes I'm working with are on STABLE no more than a 
week old.  I ran fresh build worlds on all of them before getting the 
rest of my configs going.


Thanks,
--
Michael Collette
IT Manager
TestEquity LLC
[EMAIL PROTECTED]

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]