Re: 6.x hangs on AMD64 again
Andrea Venturoli wrote: I think I'm having the same problems. I'm running 6.1(latest patch set)/amd64 on a dual-core Opteron Acer server with SCSI disks and it is hanging completely and suddenly. Checking the hardware was the first thing I did, but it really seems ok (unless it's the second core on the processor). I checked, among the others: the HDs with the vendor's tools, RAM with MemTest86+ and the CPU with different stress tools. If anyone can suggest other diagnostics I'd be happy to comply. I compiled the kernel with debug info, but that's totally useless, since it won't dump anything, just hang there; I don't think even DDB would help, since even the keyboard is not working at that time. If I'm missing something, I'd be glad to be directed to any pointer. The box features an em NIC on board, but since it shows a lot of problems, I removed that driver from the kernel (it's not possible to turn it off in the BIOS, though) and put in a different add-on card. I had some shared IRQs, but managed to solve that issue (even if I think it should not matter). Next, I'll try to disable SMP as soon as I can and see if it helps. Of course upgrading to 6.2 should be attempted, but since this is a production server and 6.2 is still at RC1... This is just to say that, since SMP was disabled, I've had no problems at all. Not that I like using UP on an x2 CPU... bye & Thanks av. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 6.x hangs on AMD64 again
On Thu, Nov 23, 2006 at 12:08:30PM +0100, Andrea Venturoli wrote: > I compiled the kernel with debug info, but that's totally useless, since > it won't dump anything, just hang there; I don't think even DDB would > help, since even the keyboard is not working at that time. Come on, you didn't even try it? :) Kris pgp9YfNAleZ2e.pgp Description: PGP signature
Re: 6.x hangs on AMD64 again
Kris Kennaway wrote: On Sat, Nov 11, 2006 at 11:15:54AM -0800, Chris wrote: If your system is hanging then you need to configure additional debugging to figure out the cause. Read the chapter on kernel debugging the developers handbook; without this information no developer can help you. Kris P.S. In my testing SMP amd64 is quite stable even under exceptionally heavy loads, so it's either something related to your hardware or your particular workload. Hadn't considered that a user level debugging solution. I'll give it a try. ... That is indeed almost always failing hardware. Hello. I think I'm having the same problems. I'm running 6.1(latest patch set)/amd64 on a dual-core Opteron Acer server with SCSI disks and it is hanging completely and suddenly. Checking the hardware was the first thing I did, but it really seems ok (unless it's the second core on the processor). I checked, among the others: the HDs with the vendor's tools, RAM with MemTest86+ and the CPU with different stress tools. If anyone can suggest other diagnostics I'd be happy to comply. I compiled the kernel with debug info, but that's totally useless, since it won't dump anything, just hang there; I don't think even DDB would help, since even the keyboard is not working at that time. If I'm missing something, I'd be glad to be directed to any pointer. The box features an em NIC on board, but since it shows a lot of problems, I removed that driver from the kernel (it's not possible to turn it off in the BIOS, though) and put in a different add-on card. I had some shared IRQs, but managed to solve that issue (even if I think it should not matter). Next, I'll try to disable SMP as soon as I can and see if it helps. Of course upgrading to 6.2 should be attempted, but since this is a production server and 6.2 is still at RC1... bye & Thanks av. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 6.x hangs on AMD64 again
On 11/11/06, Kris Kennaway <[EMAIL PROTECTED]> wrote: On Sat, Nov 11, 2006 at 11:15:54AM -0800, Chris wrote: > > > >If your system is hanging then you need to configure additional > >debugging to figure out the cause. Read the chapter on kernel > >debugging the developers handbook; without this information no > >developer can help you. > > > >Kris > > > >P.S. In my testing SMP amd64 is quite stable even under exceptionally > >heavy loads, so it's either something related to your hardware or your > >particular workload. > > Hadn't considered that a user level debugging solution. I'll give it > a try. > > We had considered it possibly related to our mix because the SuperMicro > dual xeon we are trying to replace it with was rebooting (not hanging) > without any error messages every 15-20 days. I thought it was failing > hardware. It's on 6.1 R P10. Maybe related in some way. That is indeed almost always failing hardware. Kris I had a similiar issue of rebooting or more specifically shutting down. The BIOS would also loose it's config. After some hardware swapping it turned out to be the power supply. -- Jeff Hinrichs Dundee Media & Technology, Inc [EMAIL PROTECTED] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 6.x hangs on AMD64 again
On Sat, Nov 11, 2006 at 11:15:54AM -0800, Chris wrote: > > > >If your system is hanging then you need to configure additional > >debugging to figure out the cause. Read the chapter on kernel > >debugging the developers handbook; without this information no > >developer can help you. > > > >Kris > > > >P.S. In my testing SMP amd64 is quite stable even under exceptionally > >heavy loads, so it's either something related to your hardware or your > >particular workload. > > Hadn't considered that a user level debugging solution. I'll give it > a try. > > We had considered it possibly related to our mix because the SuperMicro > dual xeon we are trying to replace it with was rebooting (not hanging) > without any error messages every 15-20 days. I thought it was failing > hardware. It's on 6.1 R P10. Maybe related in some way. That is indeed almost always failing hardware. Kris pgprkIAXZD6bJ.pgp Description: PGP signature
Re: 6.x hangs on AMD64 again
If your system is hanging then you need to configure additional debugging to figure out the cause. Read the chapter on kernel debugging the developers handbook; without this information no developer can help you. Kris P.S. In my testing SMP amd64 is quite stable even under exceptionally heavy loads, so it's either something related to your hardware or your particular workload. Hadn't considered that a user level debugging solution. I'll give it a try. We had considered it possibly related to our mix because the SuperMicro dual xeon we are trying to replace it with was rebooting (not hanging) without any error messages every 15-20 days. I thought it was failing hardware. It's on 6.1 R P10. Maybe related in some way. So much to learn, so little time. Thank you for your response. Chris Pratt ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 6.x hangs on AMD64 again
On Fri, Nov 10, 2006 at 09:16:02AM -0800, Chris wrote: > I've posted several questions (under two other ids though the name > Chris) > since March trying to put up a Tyan quad dual s4882. I've run it on 6.0 > STABLE as of about March, 6.1 RELEASE in several flavors from May > through September and finally 6.2 PRERELEASE as of mid-October. I > found issues early on with transition states on the bge interface, found > a memory chip that was marginal and have tested and tested throughout > this period. Every time we place the system back in production, we see a > hang without any indications of what the problem would be, after 4-7 > days > of running. If your system is hanging then you need to configure additional debugging to figure out the cause. Read the chapter on kernel debugging the developers handbook; without this information no developer can help you. Kris P.S. In my testing SMP amd64 is quite stable even under exceptionally heavy loads, so it's either something related to your hardware or your particular workload. pgpXEamxZ33gv.pgp Description: PGP signature
6.x hangs on AMD64 again
I've posted several questions (under two other ids though the name Chris) since March trying to put up a Tyan quad dual s4882. I've run it on 6.0 STABLE as of about March, 6.1 RELEASE in several flavors from May through September and finally 6.2 PRERELEASE as of mid-October. I found issues early on with transition states on the bge interface, found a memory chip that was marginal and have tested and tested throughout this period. Every time we place the system back in production, we see a hang without any indications of what the problem would be, after 4-7 days of running. I've tried to think of where the problems could be and it would seem that 6.x AMD64 exhibits this type of issue for many individuals who put a server under heavy load. I've seen many unresolved posts here and elsewhere that describe strikingly similar scenarios. When in full production, it's running 5 websites out of a prefork non-ssl Apache 2.2.3, light ports-installed mysql 4.19 access via perl cgi (not mod_perl) and heavy access to perl generated and flat html archives pages (for discussion just counted 300K page views for a day on one of the sites). This computer does not breath hard at all with peak hours showing top staying at 80+% idle. I've not opened up any service to where it can fill the 8Gb RAM in spawning too many processes. Process count peaks at about 180 because it services the request backlog so quickly. Active memory is usually about 250 MB and inactive varies. The configuration is very simple and it runs nothing else other than rsyncd and sshd. The hang seems to have nothing to do with peak access times, in fact, it will suddenly hang at our slowest time of the day. I ran for over a month without a hang when leaving the machine relegated to low traffic websites. We've spent a lot to get clean dedicated power and installed a monitoring hardware device to let us see what's going on, no help. Temperature of the computer room is nicely down given that it's winter here and the facility is kept fairly cold. No AC but the computer room remains about 70 degrees F. I'm aware of the warning about 6.2 PR in production but the symptoms have not deviated amongst any 6.x version and 6.2 PR was the only way to pick up the extensive changes to the bge driver without hacking. I need opinions on how to debug and possibly even who I should go to and pay to take a closer look at this scenario. Here are questions and ideas I've thought of, is there any validity in these or have you other ideas? 1. I've wondered if AMD64 SMP was a bad idea. Should I be using i386 for stability? It one thing I've not tried. 2. Should acpi be off as a precaution just to rule it out. It's not blacklisted. I'd turned it off for a long time when testing but the results were muddy. 3. Should I reduce the system to 4GB ram to attempt to skirt the issue. Is 6.x less reliable over 4GB? 4. Where can I find the meanings of all vmstat -z variables, I'm dumping them to another server every two minutes giving the percentage change on each sample, but am unsure if I can correlate this to much of anything meaningful without good definitions. Just started this but will need information. 5. Does mysql use linux threads and could that be the mistake that's taking us out? Even wild goose chases will be welcome at this point ;-). Thanks, Chris Pratt ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"