Another em0 watchdog timeout
I realize there is a previous thread discussing this, but my symptoms seem to be a little bit different. Here's the stats... FreeBSD 6.2-STABLE #1: Fri Apr 27 17:28:22 PDT 2007 [EMAIL PROTECTED]:0:0: class=0x02 card=0x108c15d9 chip=0x108c8086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = 'PRO/1000 PM' class = network subclass = ethernet [EMAIL PROTECTED]:0:0: class=0x02 card=0x109a15d9 chip=0x109a8086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' class = network subclass = ethernet em0: Intel(R) PRO/1000 Network Connection Version - 6.2.9 port 0x5000-0x501f mem 0xea30-0xea31 irq 16 at device 0.0 on pci13 em0: Ethernet address: 00:30:48:5c:cc:84 em1: Intel(R) PRO/1000 Network Connection Version - 6.2.9 port 0x6000-0x601f mem 0xea40-0xea41 irq 17 at device 0.0 on pci14 em1: Ethernet address: 00:30:48:5c:cc:85 I'm seeing the following entries in my messages log pop up about 2-4 times a day... May 1 08:29:38 alpha kernel: em0: watchdog timeout -- resetting May 1 08:29:38 alpha kernel: em0: link state changed to DOWN May 1 08:29:41 alpha kernel: em0: link state changed to UP I've gone and added the DEVICE_POLLING option in the kernel, but this doesn't seem to help. The problem only seems to happen during the hours that my users would be hitting this box, so it really gets noticed when those 3 seconds go by. And yes, it's almost always a 3 second drop on the interface. Is there anything I can do to prevent this from happening? I saw mention of a firmware update I might try, but haven't been able to locate the file in question. Thanks, -- When you come to a fork in the roadTake it - Yogi Berra ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: NFS Locking Issue
Garance A Drosihn wrote: At 9:13 PM -0400 7/1/06, Francisco Reyes wrote: John Hay writes: I only started to see the lockd problems when upgrading the server side to FreeBSD 6.x and later. I had various FreeBSD clients, between 4.x and 7-current and the lockd problem only showed up when upgrading the server from 5.x to 6.x. It confirms the same we are experiencing.. constant freezing/locking issues. I guess no more 6.X for us.. for the foreseable future.. I don't know if this will be of any help to anyone, but... I recently moved a network-based service from a 4.x machine to a 6.x machine. Despite some testing in advance of the switch, many people had problems with the service. I booted to a somewhat out-of-date snapshot of 5.x on the same box. I still had problems, but it didn't seem as bad, so I stuck with the 5.x system. Some problems turned out to be bugs in the service itself, and were eventually found and fixed. However, one set of problems on that out-of-date snapshot of 5.x were solved by adding: net.inet.tcp.rfc1323=0 to /etc/sysctl.conf. The guy who suggested that said it avoided a bug which was fixed in later versions of either 5.x or 6.x, I forget which. Of interest is that the bug was such that some people connecting to the service were never bothered by the bug, while other people could not use the service at all until I turned off tcp.rfc1323 . I have a test version of the same service running on a different FreeBSD/i386 box, and that box is now updated to freebsd-stable as of June 10th. Lo and behold, someone connecting to that test box reported some problems. So I typed in 'sysctl net.inet.tcp.rfc1323=0', and his problem immediately disappeared. So, it might be that there is still some problem with the rfc1323 processing, or that the bug which had been fixed has somehow been re-introduced. In any case, people who are experiencing problems with NFS might want to try that, and see if it makes any difference. It does strike me as odd that some people are having a *lot* of trouble with NFS under 6.x, while others seem to be okay with it. Perhaps the difference is the network topology between the NFS server and the NFS clients. Obviously, this is nothing but a guess on my part. I am not a networking guru! Thanks for the try Garance, but in my setup it didn't make any difference. I'll get into a bit more detail about my setup in another post. Later on, -- Michael Collette IT Manager TestEquity Inc [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: NFS Locking Issue
, since re-configuring it, it hasn't exhibited the problem ... if most of us get our machines configured properly to give useful information to the developers to debug this, the faster it will get fixed ... My experience with most of the developers is that if you can get into DDB and give them 'internal traces' of the code, bugs tend to get fixed very quickly ... vmstat/ps give external views, more summaries then anything ... its the details under the hood that they need ... its not much different then your auto-mechanic ... try telling him there is a 'knocking under the hood, please tell me how to fix it, but you can't have my car', and he'll brush you off ... give him 30 minutes under the hood, and not only will he have identified it, but he'll probably fix it too ... Marc, the car is starting but won't move at all. I don't know if this is the transmission, the steering wheel, or the radio. I am feeling pretty certain that this car should never have left the lot in this condition though. Again, these are problems that have been around for a while... http://www.freebsd.org/cgi/query-pr.cgi?pr=84953 http://www.freebsd.org/cgi/query-pr.cgi?pr=80389 Later on, -- Michael Collette IT Manager TestEquity Inc [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: NFS Locking Issue
Michel Talon wrote: I guess I'm still just a bit stunned that a bug this obvious not only found it's way into the STABLE branch, but is still there. Maybe it's not as obvious as I think, or not many folks are using it? All I know for sure here is that if I had upgraded to 6.1 my network would have been crippled. Strange, since i upgraded to FreeBSD-6.1 and the NFS server to Fedora Core 5, my machine, NFS client is happy, and lockd works. It is first time since years i have no problem. It certainly did not work with FreeBSD-5 and i still have a machine with FreeBSD-6.0 which does not work properly (frequently loses the NFS mount, but it gets remounted some times later by amd). Anyways i have exactly 0 problem with the 6.1 machine. I could extend that to say that everything works very well on that machine, nothing is slow, including disk access. This has not always been the case. Stability wise, i have not seen any panic, hang or whatever since i have compiled a kernel adapted to my hardware. I got a panic with the generic kernel soon after installation, but now machine is totally stable. Based on prior reading about this problem, I'd venture to guess that the file locking between FC5 and FreeBSD simply isn't. See, between just 2 machines sharing files without rpc.lockd running you won't see a problem. Both the client and the server must not only be running rpc.lockd, but they must be able to actually talk to each other. For a simple 2 machine setup, you don't really need much in the way of locking control, as you don't have to deal with multiple requests for the same resource. This is why folks just running the -L flag on their mount command also aren't having any problems. To actually see the problem isn't too hard to set up. Just have rpc.lockd, rpc.statd, and rpcbind enabled on both the client and the server. Then just starting trying to transfer a stack of files from one to the other. I found this to be true even trying to go from a 5.4 server to my 6.1 laptop here. There was quite a thread on this back in March of this year, along with a few PR's that are still opened up. I'm personally just coming head long into all of this. Later on, -- Michael Collette IT Manager TestEquity Inc [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
NFS Locking Issue
This last week I had been working on a test network to test out 6.1 prior to upgrading our production boxes from 5.4. That's when I ran across the rpc.lockd issues that have been discussed earlier. Our production setup has diskless clients running KDE, which due to this bug is now dead on 6.1. I also have my mail server delivering messages to a file server via NFS. I even have servers booting diskless with NFS provided file systems... all of which are dead on 6.1. The last discussion our bug updates I've seen on this issue were about 3 months ago. This leaves me with a number of questions I hope can be answered here on this list. Is NFS a big deal for most other users, or am I out here on the fringe using it as much as I do? Is anyone working on a fix for this? If so, is there any kind of time frame where this fix might be MFC'd to 6-STABLE? I guess I'm still just a bit stunned that a bug this obvious not only found it's way into the STABLE branch, but is still there. Maybe it's not as obvious as I think, or not many folks are using it? All I know for sure here is that if I had upgraded to 6.1 my network would have been crippled. Later on, -- Michael Collette IT Manager TestEquity LLC [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: NFS Locking Issue
Rong-en Fan wrote: On 6/29/06, Michael Collette [EMAIL PROTECTED] wrote: This last week I had been working on a test network to test out 6.1 prior to upgrading our production boxes from 5.4. That's when I ran across the rpc.lockd issues that have been discussed earlier. Our production setup has diskless clients running KDE, which due to this bug is now dead on 6.1. I also have my mail server delivering messages to a file server via NFS. I even have servers booting diskless with NFS provided file systems... all of which are dead on 6.1. The last discussion our bug updates I've seen on this issue were about 3 months ago. This leaves me with a number of questions I hope can be answered here on this list. Is NFS a big deal for most other users, or am I out here on the fringe using it as much as I do? Is anyone working on a fix for this? If so, is there any kind of time frame where this fix might be MFC'd to 6-STABLE? I guess I'm still just a bit stunned that a bug this obvious not only found it's way into the STABLE branch, but is still there. Maybe it's not as obvious as I think, or not many folks are using it? All I know for sure here is that if I had upgraded to 6.1 my network would have been crippled. Try 6.1-STABLE, especially make sure you have $FreeBSD: src/usr.sbin/rpc.lockd/kern.c,v 1.16.2.1 2006/06/02 01:20:58 rodrigc Exp $ for usr.sbin/rpc.lockd/kern.c, and see if this helps. I am running STABLE on all my test boxes, and the problem is very much there. It's not everything that locks up though. I'm able to bring X up with twm, but unable to launch any Gnome or KDE applications without them being stranded in a lock state. I sure would have loved for your suggestion to be correct. For what it's worth, all the boxes I'm working with are on STABLE no more than a week old. I ran fresh build worlds on all of them before getting the rest of my configs going. Thanks, -- Michael Collette IT Manager TestEquity LLC [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]