Re: 6.x hangs on AMD64 again

2006-12-13 Thread Andrea Venturoli

Andrea Venturoli wrote:


I think I'm having the same problems.
I'm running 6.1(latest patch set)/amd64 on a dual-core Opteron Acer 
server with SCSI disks and it is hanging completely and suddenly. 
Checking the hardware was the first thing I did, but it really seems ok 
(unless it's the second core on the processor). I checked, among the 
others: the HDs with the vendor's tools, RAM with MemTest86+ and the CPU 
 with different stress tools. If anyone can suggest other diagnostics 
I'd  be happy to comply.
I compiled the kernel with debug info, but that's totally useless, since 
it won't dump anything, just hang there; I don't think even DDB would 
help, since even the keyboard is not working at that time. If I'm 
missing something, I'd be glad to be directed to any pointer.
The box features an em NIC on board, but since it shows a lot of 
problems, I removed that driver from the kernel (it's not possible to 
turn it off in the BIOS, though) and put in a different add-on card. I 
had some shared IRQs, but managed to solve that issue (even if I think 
it should not matter).

Next, I'll try to disable SMP as soon as I can and see if it helps.

Of course upgrading to 6.2 should be attempted, but since this is a 
production server and 6.2 is still at RC1...


This is just to say that, since SMP was disabled, I've had no problems 
at all.

Not that I like using UP on an x2 CPU...

 bye  Thanks
av.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.x hangs on AMD64 again

2006-11-23 Thread Andrea Venturoli

Kris Kennaway wrote:

On Sat, Nov 11, 2006 at 11:15:54AM -0800, Chris wrote:

If your system is hanging then you need to configure additional
debugging to figure out the cause.  Read the chapter on kernel
debugging the developers handbook; without this information no
developer can help you.

Kris

P.S. In my testing SMP amd64 is quite stable even under exceptionally
heavy loads, so it's either something related to your hardware or your
particular workload.
Hadn't considered that a user level debugging solution. I'll give it  
a try.

...

That is indeed almost always failing hardware.



Hello.
I think I'm having the same problems.
I'm running 6.1(latest patch set)/amd64 on a dual-core Opteron Acer 
server with SCSI disks and it is hanging completely and suddenly. 
Checking the hardware was the first thing I did, but it really seems ok 
(unless it's the second core on the processor). I checked, among the 
others: the HDs with the vendor's tools, RAM with MemTest86+ and the CPU 
 with different stress tools. If anyone can suggest other diagnostics 
I'd  be happy to comply.
I compiled the kernel with debug info, but that's totally useless, since 
it won't dump anything, just hang there; I don't think even DDB would 
help, since even the keyboard is not working at that time. If I'm 
missing something, I'd be glad to be directed to any pointer.
The box features an em NIC on board, but since it shows a lot of 
problems, I removed that driver from the kernel (it's not possible to 
turn it off in the BIOS, though) and put in a different add-on card. I 
had some shared IRQs, but managed to solve that issue (even if I think 
it should not matter).

Next, I'll try to disable SMP as soon as I can and see if it helps.

Of course upgrading to 6.2 should be attempted, but since this is a 
production server and 6.2 is still at RC1...


 bye  Thanks
av.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.x hangs on AMD64 again

2006-11-23 Thread Kris Kennaway
On Thu, Nov 23, 2006 at 12:08:30PM +0100, Andrea Venturoli wrote:

 I compiled the kernel with debug info, but that's totally useless, since 
 it won't dump anything, just hang there; I don't think even DDB would 
 help, since even the keyboard is not working at that time.

Come on, you didn't even try it? :)

Kris

pgp9YfNAleZ2e.pgp
Description: PGP signature


Re: 6.x hangs on AMD64 again

2006-11-12 Thread Jeff Hinrichs - DMT

On 11/11/06, Kris Kennaway [EMAIL PROTECTED] wrote:

On Sat, Nov 11, 2006 at 11:15:54AM -0800, Chris wrote:
 
 If your system is hanging then you need to configure additional
 debugging to figure out the cause.  Read the chapter on kernel
 debugging the developers handbook; without this information no
 developer can help you.
 
 Kris
 
 P.S. In my testing SMP amd64 is quite stable even under exceptionally
 heavy loads, so it's either something related to your hardware or your
 particular workload.

 Hadn't considered that a user level debugging solution. I'll give it
 a try.

 We had considered it possibly related to our mix because the SuperMicro
 dual xeon we are trying to replace it with was rebooting (not hanging)
 without any error messages every 15-20 days. I thought it was failing
 hardware. It's on 6.1 R P10. Maybe related in some way.

That is indeed almost always failing hardware.

Kris





I had a similiar issue of rebooting or more specifically shutting
down.  The BIOS would also loose it's config.  After some hardware
swapping it turned out to be the power supply.

--
Jeff Hinrichs
Dundee Media  Technology, Inc
[EMAIL PROTECTED]
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.x hangs on AMD64 again

2006-11-11 Thread Chris


If your system is hanging then you need to configure additional
debugging to figure out the cause.  Read the chapter on kernel
debugging the developers handbook; without this information no
developer can help you.

Kris

P.S. In my testing SMP amd64 is quite stable even under exceptionally
heavy loads, so it's either something related to your hardware or your
particular workload.


Hadn't considered that a user level debugging solution. I'll give it  
a try.


We had considered it possibly related to our mix because the SuperMicro
dual xeon we are trying to replace it with was rebooting (not hanging)
without any error messages every 15-20 days. I thought it was failing
hardware. It's on 6.1 R P10. Maybe related in some way.

So much to learn, so little time. Thank you for your response.
Chris Pratt
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.x hangs on AMD64 again

2006-11-11 Thread Kris Kennaway
On Sat, Nov 11, 2006 at 11:15:54AM -0800, Chris wrote:
 
 If your system is hanging then you need to configure additional
 debugging to figure out the cause.  Read the chapter on kernel
 debugging the developers handbook; without this information no
 developer can help you.
 
 Kris
 
 P.S. In my testing SMP amd64 is quite stable even under exceptionally
 heavy loads, so it's either something related to your hardware or your
 particular workload.
 
 Hadn't considered that a user level debugging solution. I'll give it  
 a try.
 
 We had considered it possibly related to our mix because the SuperMicro
 dual xeon we are trying to replace it with was rebooting (not hanging)
 without any error messages every 15-20 days. I thought it was failing
 hardware. It's on 6.1 R P10. Maybe related in some way.

That is indeed almost always failing hardware.

Kris


pgprkIAXZD6bJ.pgp
Description: PGP signature


6.x hangs on AMD64 again

2006-11-10 Thread Chris
I've posted several questions (under two other ids though the name  
Chris)

since March trying to put up a Tyan quad dual s4882. I've run it on 6.0
STABLE as of about March, 6.1 RELEASE in several flavors from May
through September and finally 6.2 PRERELEASE as of mid-October. I
found issues early on with transition states on the bge interface, found
a memory chip that was marginal and have tested and tested throughout
this period. Every time we place the system back in production, we see a
hang without any indications of what the problem would be, after 4-7  
days

of running.

I've tried to think of where the problems could be and it would seem  
that

6.x AMD64 exhibits this type of issue for many individuals who put a
server under heavy load. I've seen many unresolved posts here and
elsewhere that describe strikingly similar scenarios. When in full  
production,

it's running 5 websites out of a prefork non-ssl Apache 2.2.3, light
ports-installed mysql 4.19 access via perl cgi (not mod_perl) and heavy
access to perl generated and flat html archives pages (for discussion  
just

counted 300K page views for a day on one of the sites). This computer
does not breath hard at all with peak hours showing top staying at 80+%
idle. I've not opened up any service to where it can fill the 8Gb RAM in
spawning too many processes. Process count peaks at about 180 because
it services the request backlog so quickly. Active memory is usually  
about

250 MB and inactive varies. The configuration is very simple and it runs
nothing else other than rsyncd and sshd. The hang seems to have nothing
to do with peak access times, in fact, it will suddenly hang at our  
slowest

time of the day. I ran for over a month without a hang when leaving the
machine relegated to low traffic websites.

We've spent a lot to get clean dedicated power and installed a  
monitoring

hardware device to let us see what's going on, no help. Temperature of
the computer room is nicely down given that it's winter here and the
facility is kept fairly cold. No AC but the computer room remains about
70 degrees F.

I'm aware of the warning about 6.2 PR in production but the symptoms
have not deviated amongst any 6.x version and 6.2 PR was the only way
to pick up the extensive changes to the bge driver without hacking. I  
need

opinions on how to debug and possibly even who I should go to and pay
to take a closer look at this scenario. Here are questions and ideas  
I've

thought of, is there any validity in these or have you other ideas?

1. I've wondered if AMD64 SMP was a bad idea. Should I be using i386
for stability? It one thing I've not tried.
2. Should acpi be off as a precaution just to rule it out. It's not  
blacklisted.
I'd turned it off for a long time when testing but the results were  
muddy.

3. Should I reduce the system to 4GB ram to attempt to skirt the issue.
Is 6.x less reliable over 4GB?
4. Where can I find the meanings of all vmstat -z variables, I'm dumping
them to another server every two minutes giving the percentage change
on each sample, but am unsure if I can correlate this to much of
anything meaningful without good definitions. Just started this but will
need information.
5. Does mysql use linux threads and could that be the mistake that's
taking us out?

Even wild goose chases will be welcome at this point ;-).

Thanks,
Chris Pratt

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.x hangs on AMD64 again

2006-11-10 Thread Kris Kennaway
On Fri, Nov 10, 2006 at 09:16:02AM -0800, Chris wrote:
 I've posted several questions (under two other ids though the name  
 Chris)
 since March trying to put up a Tyan quad dual s4882. I've run it on 6.0
 STABLE as of about March, 6.1 RELEASE in several flavors from May
 through September and finally 6.2 PRERELEASE as of mid-October. I
 found issues early on with transition states on the bge interface, found
 a memory chip that was marginal and have tested and tested throughout
 this period. Every time we place the system back in production, we see a
 hang without any indications of what the problem would be, after 4-7  
 days
 of running.

If your system is hanging then you need to configure additional
debugging to figure out the cause.  Read the chapter on kernel
debugging the developers handbook; without this information no
developer can help you.

Kris

P.S. In my testing SMP amd64 is quite stable even under exceptionally
heavy loads, so it's either something related to your hardware or your
particular workload.


pgpXEamxZ33gv.pgp
Description: PGP signature