Re: 6.0 random freezes

2005-12-14 Thread fredthetree
i've only used the generic 6.0 kernel

# kgdb kernel.debug /var/crash/vmcore.1
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so:
Undefined symbol ps_pglobal_lookup]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i386-marcel-freebsd.

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x10
fault code  = supervisor read, page not present
instruction pointer = 0x20:0xc0a7cf08
stack pointer   = 0x28:0xd56a694c
frame pointer   = 0x28:0x0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 29 (swi1: net)
trap number = 12
panic: page fault
Uptime: 1d23h40m51s
Dumping 511 MB (2 chunks)
  chunk 0: 1MB (159 pages) ... ok
  chunk 1: 511MB (130800 pages) 495 479 463 447 431 415 399 383 367 351 335
319 303 287 271 255 239 223 207 (CTRL-C to abort)  (CTRL-C to abort)
(CTRL-C to abort)  191 175 159 143 127 111 95 79 63 47 31 15

#0  doadump () at pcpu.h:165
165 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) where
#0  doadump () at pcpu.h:165
#1  0xc0638202 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xc0638498 in panic (fmt=0xc084e5a2 %s)
at /usr/src/sys/kern/kern_shutdown.c:555
#3  0xc0807c30 in trap_fatal (frame=0xd56a690c, eva=16)
at /usr/src/sys/i386/i386/trap.c:831
#4  0xc080799b in trap_pfault (frame=0xd56a690c, usermode=0, eva=16)
at /usr/src/sys/i386/i386/trap.c:742
#5  0xc08075d9 in trap (frame=
  {tf_fs = -1038680056, tf_es = 40, tf_ds = 40, tf_edi = 0, tf_esi =
-646886620, tf_ebp = 0, tf_isp = -714446536, tf_ebx = -646862464, tf_edx =
791735, tf_ecx = -1073475471, tf_eax = 1, tf_trapno = 12, tf_err = 0, tf_eip
= -1062744312, tf_cs = 32, tf_eflags = 66050, tf_esp = 16798208, tf_ss = 0})
at /usr/src/sys/i386/i386/trap.c:432
#6  0xc07f6dca in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#7  0xc0a7cf08 in ?? ()


On 12/13/05, Peter Jeremy [EMAIL PROTECTED] wrote:

 On Tue, 2005-Dec-13 13:43:13 -0400, fredthetree wrote:
 [/var/crash/vmcore.1]
 --
 Unread portion of the kernel message buffer:
 
 
 Fatal trap 12: page fault while in kernel mode
 fault virtual address   = 0x10

 That's a NULL pointer de-reference - it Shouldn't Happen(TM).  Can you
 get a backtrace from kgdb (where)?

 [vmcore.0]
 --
 Unread portion of the kernel message buffer:
 ÃwÄ0¿Á0ÂÁÀíÁÀJðÂÄüÂ3ÄÓÂÀíÁÀóÂDþÁÀóÂðCÂÀíÁ1Ä
 ÃðÚÃÀíÁ´ÂÄBð°ÄÁÀíÁ[EMAIL PROTECTED]@
 --

 The most likely problem is that your vmcore file doesn't match your
 kernel.
 Are you running kgdb with the same kernel as was running when the system
 crashed?  (If you don't have that kernel handy, you might as well delete
 vmcore.0).

 --
 Peter Jeremy

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.0 random freezes

2005-12-14 Thread Peter Jeremy
On Wed, 2005-Dec-14 08:28:26 -0400, fredthetree wrote:
i've only used the generic 6.0 kernel

# kgdb kernel.debug /var/crash/vmcore.1
...
#6  0xc07f6dca in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#7  0xc0a7cf08 in ?? ()

Unfortunately, it's frame 7 and below that is crucial.  Was #7 the
last line or did you cut the backtrace off?  The top frames are the
kernel handling the trap.  It looks like the trap occurred in a KLD -
in this case, try running:
  # cd /usr/obj/usr/src/sys/GENERIC  (or name of kernel config)
  # make gdbinit[this just copies a few config files for kgdb]
  # gdb kernel.debug /var/crash/vmcore.1
  (kgdb) kldsyms
  (kgdb) where

Hopefully this will decode #7 and you can provide a few more frames.

-- 
Peter Jeremy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.0 random freezes

2005-12-14 Thread Atanas

Peter Jeremy said the following on 12/13/05 02:00:


Note that PS/2 keyboards aren't hot-pluggable and attempts to do so
can have deleterious effects on your keyboard and/or motherboard.  In
any case, the probe/attach sequence relies on the kernel being in a
reasonably sane state (and I'm not sure if it will detect the keyboard
as a console device except at boot time).

I agree, but the keyboard is a passive device (with no power source, 
i.e. mostly harmless), and it's a standard practice to have only few 
movable consoles for several racks and plug them in only where it's 
necessary. It always has been working for us and I don't remember having 
any hot-plugging accidents for years.



If the keyboard has been plugged in since the system booted, do you
still get the same no response?  If so, the kernel has wedged at
a fairly low level and I'm not quite sure how to proceed other than
by enabling the sanity checks that other people have mentioned
(eg WITNESS, INVARIANTS) and hoping they catch something.

I cannot say for sure. When the thing happens I'm usually away, and 
until I go there, the console could have been used by someone. I'm in 
process of getting a serial console, so if there's no response as well, 
I will enable the sanity checks.



I only mentioned serial consoles on the off-chance that you had one.
Whilst it may not help here, serial consoles have a number of
advantages when managing remote equipment


Thanks for pointing this. As I said I'm in process of getting one for 
now, and possibly equipping some dozens of servers with that later.


After the downgrade we could eventually set a test bed and start 
hammering it with requests. The problem would be how to trigger the 
crash and whether we would be able to reproduce it at all.


I already went to the 5.4 downgrade way. Actually I was forced to do so 
during the other night, when one of the machines started hanging up in 
every half an hour or so. Looks like the background fsck on the slower 
SATA based RAID5 array helped a lot with that.


Now I have the test bed online. This is the very same server (SCSI 
based, with the OS drive intact and production data drives moved 
elsewhere) that was crashing once a day or so. Hopefully tomorrow I will 
have a serial console attached to it, so we can start pounding it. I 
hope this machine won't need to go in production during the next month 
or so and we'll have enough time for tests.


 Depending on your application and the interfaces to it, it might be
 feasible to either tee live traffic into both systems and just junk
 the responses from your test bed, or record live traffic and
 replay it into your test bed.

It runs a fairly complex set of services. It's a shared web hosting 
server handling some hundreds of websites, and also email 
SMTP/POP3/IMAP, databases MySQL, FTP, DNS, etc.


I don't know how easy would be implement such traffic gathering and 
replaying that on the test bed. It seems kind of complicated at first 
sight (though I realize it might be the only way to reproduce the 
crash). We might need some NAT (via ipfw?), some services might not like 
their responses being junked, etc.


I was thinking about trying the kernel stress suite first. Or just have 
something rsync-ing lots files back and forth (possibly over the 
network), run apache bench in a loop and point it to some database 
intensive page, etc.


Regards,
Atanas
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.0 random freezes

2005-12-13 Thread Atanas

Atanas said the following on 12/12/05 18:57:

Peter Jeremy said the following on 12/12/05 13:40:



 When it hangs, break into DDB (Ctrl-Alt-Esc on the console or BREAK on
 a serial console).

 But if I have no keyboard response I won't be able to save it, right?

(replying to myself)
This is exactly what I was afraid would happen. The SATA based box just 
hung up again, with all of the kernel debugging options in place:


  makeoptions   DEBUG=-g
  options   KDB
  options   DDB

But I wasn't able to do anything with the keyboard in order to save a 
crashdump, so I got no other choices than hitting the reset button.


Regards,
Atanas
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.0 random freezes

2005-12-13 Thread Peter Jeremy
On Mon, 2005-Dec-12 18:57:24 -0800, Atanas wrote:
When I plug a keyboard, there's no response at all - no LEDs, no VTYs, 
Ctrl-Alt-Esc, etc. You might think of hint.atkbd.0.flags not being set
properly, but it's right (i.e. unchanged, it appears to default to that
on i386 5.x+) and other machines with identical configuration do accept
keyboard.

Note that PS/2 keyboards aren't hot-pluggable and attempts to do so
can have deleterious effects on your keyboard and/or motherboard.  In
any case, the probe/attach sequence relies on the kernel being in a
reasonably sane state (and I'm not sure if it will detect the keyboard
as a console device except at boot time).

If the keyboard has been plugged in since the system booted, do you
still get the same no response?  If so, the kernel has wedged at
a fairly low level and I'm not quite sure how to proceed other than
by enabling the sanity checks that other people have mentioned
(eg WITNESS, INVARIANTS) and hoping they catch something.

the next crash. But if I have no keyboard response I won't be able to 
save it, right?

True.  But DDB has been designed to rely on a fairly minimal subset of
kernel functionality and often works, even though the system appears frozen.

I do not know what a serial console is and would need some time to get 
along with it. Would I get something in addition to what I can get from 
the standard console?

I only mentioned serial consoles on the off-chance that you had one.
Whilst it may not help here, serial consoles have a number of
advantages when managing remote equipment (ie equipment not sitting on
your desk) - you can access the console remotely and log all console
output on another computer (either cross-connect com1/com2 on pairs of
hosts or get a multi-port serial card to manage a number of systems).
If your BIOS can handle serial communications, you virtually never
need to physically visit your servers.  For details, check out:
http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/serialconsole-setup.html
I personally use ports/comms/conserver-com to manage about 50 assorted
Unix/FreeBSD servers and switches at work.

The dumpdev variable seems to default to AUTO, i.e. trying to use the 
first swap device if it's bigger than the RAM (in my case yes), so I 
guess I don't need to touch it.

It seems that my suggestion has been obsoleted by changes to the startup
scripts since I checked last.

After the downgrade we could eventually set a test bed and start 
hammering it with requests. The problem would be how to trigger the 
crash and whether we would be able to reproduce it at all.

Depending on your application and the interfaces to it, it might be
feasible to either tee live traffic into both systems and just junk
the responses from your test bed, or record live traffic and
replay it into your test bed.

-- 
Peter Jeremy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.0 random freezes

2005-12-13 Thread fredthetree
On 12/13/05, fredthetree [EMAIL PROTECTED] wrote:

 On 12/12/05, Peter Jeremy [EMAIL PROTECTED] wrote:

  On Mon, 2005-Dec-12 22:21:52 -0400, fredthetree wrote:
  I just wanted to chime in and say I've had my 6.0-RELEASE #0 freeze up
  twice
  in the past few days.  never once had it happen with 5.x.  everything
  locks,
  no keyboard response, no mouse, and after several minutes it reboots
  itself,
  and savecore starts up during boot..
 
  This suggests you've had a panic (or something that develops into
  one).  If you've got a crashdump, you can probably get enough
  information out of it for people to get an idea of what is wrong.
  Please have a look at:
  http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebg-gdb.html
 
 
  --
  Peter Jeremy


 http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-gdb.html
 (fixed the link)


 does the following help?

 [/var/crash/vmcore.1]
 --
 Unread portion of the kernel message buffer:


 Fatal trap 12: page fault while in kernel mode
 fault virtual address   = 0x10
 fault code  = supervisor read, page not present
 instruction pointer = 0x20:0xc0a7cf08
 stack pointer   = 0x28:0xd56a694c
 frame pointer   = 0x28:0x0
 code segment= base 0x0, limit 0xf, type 0x1b
 = DPL 0, pres 1, def32 1, gran 1
 processor eflags= interrupt enabled, resume, IOPL = 0
 current process = 29 (swi1: net)
 trap number = 12
 panic: page fault
 Uptime: 1d23h40m51s
 Dumping 511 MB (2 chunks)
   chunk 0: 1MB (159 pages) ... ok
   chunk 1: 511MB (130800 pages) 495 479 463 447 431 415 399 383 367 351
 335 319 303 287 271 255 239 223 207 (CTRL-C to abort)  (CTRL-C to abort)
 (CTRL-C to abort)  191 175 159 143 127 111 95 79 63 47 31 15

 #0  doadump () at pcpu.h:165
 165 pcpu.h: No such file or directory.
 in pcpu.h
 --

 [vmcore.0]
 --
 Unread portion of the kernel message buffer:
 ÃwÄ0¿Á0ÂÁÀíÁÀJðÂÄüÂ3ÄÓÂÀíÁÀóÂDþÁÀóÂðCÂÀíÁ1Ä
 ÃðÚÃÀíÁ´ÂÄBð°ÄÁÀíÁ[EMAIL PROTECTED]@
 --
 (after this text displays, i am unable to view the kgdb prompt... type
 exit [return] three times and i get back to the shell...)



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.0 random freezes

2005-12-13 Thread Luke Dean



On Tue, 13 Dec 2005, Atanas wrote:


Atanas said the following on 12/12/05 18:57:

Peter Jeremy said the following on 12/12/05 13:40:


When it hangs, break into DDB (Ctrl-Alt-Esc on the console or BREAK on
a serial console).


But if I have no keyboard response I won't be able to save it, right?


(replying to myself)
This is exactly what I was afraid would happen. The SATA based box just hung 
up again, with all of the kernel debugging options in place:


 makeoptions   DEBUG=-g
 options   KDB
 options   DDB

But I wasn't able to do anything with the keyboard in order to save a 
crashdump, so I got no other choices than hitting the reset button.


Regards,
Atanas


I posted this same problem recently.
My latest attempt to troubleshoot the freezes was to snatch the SATA card 
out of the box.
The machine has been running without any problems since.  That was six 
days, 12 hours ago - the longest uptime I've had since I upgraded the 
machine to version 6.
The only reason I added the SATA controller to the box was to set up a 
gmirror to make backups to, and since the machine kept freezing all the 
time I couldn't make decent backups anyway, so removing the SATA card 
didn't change the machine's duties at all.
The reason I suspected the card might be the problem is that I installed 
it at the same time I upgraded to FreeBSD 6 (when the problems started), 
the card only cost $20, and the drives attached to the card were getting 
corrupted during crashes even though they weren't in use.  The card was a 
SYBA SD-SATA-4P.  I've also got an rl0 ethernet interface on the PCI bus 
also, and I wondered if it might be some kind of bus-mastering conflict or 
something.
I still don't know if the problem was bad cheap hardware, bad interactions 
between cheap hardware, or a software problem. 
I suppose downgrading to 5.4 again might give some clues, but I don't 
really want to do that right now since the system finally seems to be 
stable again, albeit without my large disk array.
This may easily have nothing to do with your problem, since we have so 
little information to go on, but the symptoms sound the same.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.0 random freezes

2005-12-13 Thread Peter Jeremy
On Tue, 2005-Dec-13 13:43:13 -0400, fredthetree wrote:
[/var/crash/vmcore.1]
--
Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x10

That's a NULL pointer de-reference - it Shouldn't Happen(TM).  Can you
get a backtrace from kgdb (where)?

[vmcore.0]
--
Unread portion of the kernel message buffer:
ÃwÄ0¿Á0ÂÁÀíÁÀJðÂÄüÂ3ÄÓÂÀíÁÀóÂDþÁÀóÂðCÂÀíÁ1Ä
ÃðÚÃÀíÁ´ÂÄBð°ÄÁÀíÁ[EMAIL PROTECTED]@
--

The most likely problem is that your vmcore file doesn't match your kernel.
Are you running kgdb with the same kernel as was running when the system
crashed?  (If you don't have that kernel handy, you might as well delete
vmcore.0).

-- 
Peter Jeremy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.0 random freezes

2005-12-12 Thread Claus Guttesen
 The load I'm talking about is less than moderate (less that 2.0 with
 plenty of CPU idle time). The freezing thing also does not appear to
 happen at peak times (I have rrdtool based CPU load graphs).

 Both machines have (almost) identical motherboards:

 Intel SE7520JR2SCSID2 and SE7520JR2ATAD2
 2 Intel XeonE 3.2GHz 800MHz CPUs
 4GB DDRII400 RegECC RAM
 Both machines boot with ACPI and hyperthreading enabled.

Try to disable HTT in bios. It seldom gives you very much, and
somtetimes degrades performance. Is it a webserver? If it generates
alot of temporary files you can try adding/changing the following in
/etc/sysctl.conf:

kern.ipc.somaxconn=2048
kern.maxfiles=65536
vfs.ufs.dirhash_maxmem=8388608

regards
Claus
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.0 random freezes

2005-12-12 Thread Ronald Klop

On Mon, 12 Dec 2005 22:15:55 +0100, Atanas [EMAIL PROTECTED] wrote:


Hi,

I have 3 machines running 6.0-RELEASE, and recently 2 of them started
freezing once a day or so. There are no error messages on the console or
in the system logs.


What happens if you set one of these sysctl values to 0? (This disables  
SMP changes from 5.4 to 6.0.)

debug.mpsafevfs: 1
debug.mpsafenet: 1
debug.mpsafevm: 1

And is there a possibility (performance-wise) to build a kernel with  
WITNESS and/or INVARIANTS options compiled in. This will give more info  
about possible locking problems. Your system will run slower. And because  
of this the problem may not occur anymore, but it is worth the try.


Ronald.

--
 Ronald Klop
 Amsterdam, The Netherlands
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.0 random freezes

2005-12-12 Thread Peter Jeremy
On Mon, 2005-Dec-12 13:15:55 -0800, Atanas wrote:
I have 3 machines running 6.0-RELEASE, and recently 2 of them started 
freezing once a day or so. There are no error messages on the console or 
in the system logs.

The first one I put in production about a month ago and it was working 
flawlessly until it got some load and now it started freezing almost 
every day. The second one has exactly the same behavior - it was fine 
when doing nothing (a couple of weeks), and started freezing when loaded.

Define freezing:  Does it respond to pings?  Can you switch VTYs?
Do the num-lock/caps-lock LEDs respond?  Do some processes seem to
freeze before others?

I suggest you add the following to your kernel config:
 options KDB # Enable kernel debugger support.
 options DDB # Support DDB.

When it hangs, break into DDB (Ctrl-Alt-Esc on the console or BREAK on
a serial console).

As a start, run 'show lockedvnods' and 'ps'.  My guess is that you'll
see a lock that has a number of waiters - which is probably the
culprit.  Use 'panic' or 'call doadump' to get a crashdump and then
you can use kgdb to rummage around once you reboot - see
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebg-gdb.html

 makeoptions   DEBUG=-g # Build kernel with gdb(1) debug symbols

I suggest you add this back in.  Without it, you can't debug any crash
dumps that you manage to get (and add dumpdev to your rc.conf).

Now the only reasonable option for me (I mean for production and in 
relatively short term) seems going downward to 5.4 and wait until 6.x 
get more stable

Whilst I realise that you can't have production machines freezing on
schedule, your assistance in providing more information about your
problem will help make 6.x more stable.

-- 
Peter Jeremy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.0 random freezes

2005-12-12 Thread Atanas

Claus Guttesen said the following on 12/12/05 13:23:

Both machines boot with ACPI and hyperthreading enabled.


Try to disable HTT in bios.


I think that I already achieved that by simply disabling the acpi module
from device.hints, and it had no effect to the problem.


It seldom gives you very much, and
somtetimes degrades performance. Is it a webserver?


It is a web server, and as such it tends to generate a lot of processes,
many of them independent of each other and trying to run simultaneously.
Thus more work horses (even less powerful virtual CPUs) make the server
to perform smoother.

This is just a practical observation though, and I could be wrong. I
would rather go with 2 dual core Opterons, but these are sort of
expensive for now.


If it generates
alot of temporary files you can try adding/changing the following in
/etc/sysctl.conf:

kern.ipc.somaxconn=2048
kern.maxfiles=65536
vfs.ufs.dirhash_maxmem=8388608


Currently I have the following:

  kern.ipc.somaxconn: 1024
  kern.maxfiles: 12328
  vfs.ufs.dirhash_maxmem: 2097152
  kern.openfiles: 1992

It's closest relative (running 5.4-RELEASE on the same hardware) handles
about twice more requests, temporary files, and open files.
kern.openfiles there is about 4000, and if something tries to go above
the limits, the kernel usually reports that.

I have plenty of other boxes serving at least twice more requests with
less powerful (also hyperthreaded) CPUs running 4.x and 5.x and with no
problems. The ones I have problems with are way less loaded, and are
supposedly faster ones.

Thanks for your suggestions!

Regards,
Atanas

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.0 random freezes

2005-12-12 Thread Atanas

Ronald Klop said the following on 12/12/05 13:27:


What happens if you set one of these sysctl values to 0? (This disables  
SMP changes from 5.4 to 6.0.)

debug.mpsafevfs: 1
debug.mpsafenet: 1
debug.mpsafevm: 1


Thanks for the suggestion!
I just did so and rebooted both machines, so we'll see.

I remember unseting debug.mpsafenet before 5.4 due to some ipfw 
limitations, but didn't know about the other two.


And is there a possibility (performance-wise) to build a kernel with  
WITNESS and/or INVARIANTS options compiled in. This will give more info  
about possible locking problems. Your system will run slower. And 
because  of this the problem may not occur anymore, but it is worth the 
try.


Both machines are not much loaded, so I could afford slowing them down a 
bit for a while (I hope it won't be several times slower). I will do 
that at some point later if the problem still persists.


I hope I won't be forced to downgrade to 5.4, though I'm already working 
on that (just in case).


Regards,
Atanas
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.0 random freezes

2005-12-12 Thread fredthetree
I just wanted to chime in and say I've had my 6.0-RELEASE #0 freeze up twice
in the past few days.  never once had it happen with 5.x.  everything locks,
no keyboard response, no mouse, and after several minutes it reboots itself,
and savecore starts up during boot..  and again, it's not during heavy load,
both times i was running X, just browsing ye olde internet with firefox,
some other apps running as per usual..

FreeBSD atlan.ns.ca 6.0-RELEASE FreeBSD 6.0-RELEASE #0: Thu Nov  3 09:36:13
UTC 2005 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC  i386

Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 6.0-RELEASE #0: Thu Nov  3 09:36:13 UTC 2005
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC
Timecounter i8254 frequency 1193182 Hz quality 0
CPU: Intel Pentium III (701.59-MHz 686-class CPU)
  Origin = GenuineIntel  Id = 0x683  Stepping = 3

Features=0x387f9ffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,PN,MMX,FXSR,SSE
real memory  = 536805376 (511 MB)
avail memory = 515956736 (492 MB)
ath_hal: 0.9.14.9 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413)
npx0: [FAST]
npx0: math processor on motherboard
npx0: INT 16 interface
acpi0: AWARD AWRDACPI on motherboard
acpi0: Power Button (fixed)
pci_link0: ACPI PCI Link LNKA irq 11 on acpi0
pci_link1: ACPI PCI Link LNKB irq 5 on acpi0
pci_link2: ACPI PCI Link LNKC irq 9 on acpi0
pci_link3: ACPI PCI Link LNKD irq 10 on acpi0
Timecounter ACPI-safe frequency 3579545 Hz quality 1000
acpi_timer0: 24-bit timer at 3.579545MHz port 0x4008-0x400b on acpi0
cpu0: ACPI CPU on acpi0
acpi_throttle0: ACPI CPU Throttling on cpu0
acpi_button0: Power Button on acpi0
pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff,0x4000-0x4041,0x5000-0x500f
on acpi0
pci0: ACPI PCI bus on pcib0
agp0: Intel 82443BX (440 BX) host to PCI bridge mem 0xe000-0xe3ff
at device 0.0 on pci0
pcib1: PCI-PCI bridge at device 1.0 on pci0
pci1: PCI bus on pcib1
pci1: display, VGA at device 0.0 (no driver attached)
isab0: PCI-ISA bridge at device 7.0 on pci0
isa0: ISA bus on isab0
atapci0: Intel PIIX4 UDMA33 controller port
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 7.1 on pci0
ata0: ATA channel 0 on atapci0
ata1: ATA channel 1 on atapci0
uhci0: Intel 82371AB/EB (PIIX4) USB controller port 0x9000-0x901f irq 10
at device 7.2 on pci0
uhci0: [GIANT-LOCKED]
usb0: Intel 82371AB/EB (PIIX4) USB controller on uhci0
usb0: USB revision 1.0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
pci0: bridge at device 7.3 (no driver attached)
pcm0: Creative EMU10K1 port 0x9400-0x941f irq 9 at device 10.0 on pci0
pcm0: TriTech TR28602 AC97 Codec
ath0: Atheros 5212 mem 0xe800-0xe800 irq 5 at device 11.0 on pci0
ath0: Ethernet address: 00:0f:3d:50:13:5c
ath0: mac 5.9 phy 4.3 radio 4.6
fdc0: floppy drive controller port 0x3f2-0x3f5,0x3f7 irq 6 drq 2 on acpi0
fdc0: [FAST]
fd0: 1440-KB 3.5 drive on fdc0 drive 0
sio0: 16550A-compatible COM port port 0x3f8-0x3ff irq 4 flags 0x10 on
acpi0
sio0: type 16550A
sio1: 16550A-compatible COM port port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
ppc0: Standard parallel printer port port 0x378-0x37f irq 7 on acpi0
ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode
ppbus0: Parallel port bus on ppc0
plip0: PLIP network interface on ppbus0
lpt0: Printer on ppbus0
lpt0: Interrupt-driven port
ppi0: Parallel I/O on ppbus0
atkbdc0: Keyboard controller (i8042) port 0x60,0x64 irq 1 on acpi0
atkbd0: AT Keyboard irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
psm0: PS/2 Mouse irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: model IntelliMouse, device ID 3
pmtimer0 on isa0
orm0: ISA Option ROM at iomem 0xc-0xc on isa0
sc0: System console at flags 0x100 on isa0
sc0: VGA 16 virtual consoles, flags=0x300
vga0: Generic ISA VGA at port 0x3c0-0x3df iomem 0xa-0xb on isa0
umass0: Cowon Systems, Inc. iAUDIO M3 Digital Audio Player, rev 2.00/1.00,
addr 2
ugen0: EPSON EPSON Scanner, rev 2.00/1.10, addr 3
Timecounter TSC frequency 701594095 Hz quality 800
Timecounters tick every 1.000 msec
ad0: 8063MB FUJITSU MPD3084AT DD-03-47 at ata0-master UDMA33
ad1: 57241MB WDC WD600BB-32CXA0 02.05B02 at ata0-slave UDMA33
acd0: CDRW HL-DT-ST GCE-8525B/1.03 at ata1-master UDMA33
da0 at umass-sim0 bus 0 target 0 lun 0
da0: TOSHIBA MK2004GAL JC10 Fixed Direct Access SCSI-0 device
da0: 1.000MB/s transfers
da0: 19073MB (39063024 512 byte sectors: 255H 63S/T 2431C)
Trying to mount root from ufs:/dev/ad0s1a



On 12/12/05, Atanas [EMAIL PROTECTED] wrote:

 Ronald Klop said the following on 12/12/05 13:27:
 
  What happens if you set one of these sysctl values to 0? (This disables
  SMP changes from 5.4 to 6.0.)
  debug.mpsafevfs: 1
  debug.mpsafenet : 1
  debug.mpsafevm: 1
 
 Thanks for the suggestion!
 I just did so and rebooted both machines, so we'll see.

 

Re: 6.0 random freezes

2005-12-12 Thread Atanas

Peter Jeremy said the following on 12/12/05 13:40:


Define freezing:  Does it respond to pings?  Can you switch VTYs?
Do the num-lock/caps-lock LEDs respond?  Do some processes seem to
freeze before others?


I used the word freeze instead of crash, because the latter often
gets associated with some errors reported by the kernel in system logs
or on the console. In this case there are absolutely no error messages. 
I have also remote logging enabled (on another machine over the 
network), but there's nothing either.


When the thing happens, the server appears to respond to pings for the
first few minutes, but everything goes down until I go to the data canter.

When I plug a keyboard, there's no response at all - no LEDs, no VTYs, 
Ctrl-Alt-Esc, etc. You might think of hint.atkbd.0.flags not being set

properly, but it's right (i.e. unchanged, it appears to default to that
on i386 5.x+) and other machines with identical configuration do accept
keyboard.

I have no information about processes. Only the thing I have is a real 
time CPU load graph. I have a script tailing the output of a vmstat cpu 
15 and drawing a graph with user/system/idle times, so according to 
that graph there are no load spikes or unusual variations before the 
crashes. The usual user/system/idle percentages look like 10/7/83.



I suggest you add the following to your kernel config:
 options KDB # Enable kernel debugger support.
 options DDB # Support DDB.


I just set these along with the DEBUG option below, and got the new
kernel (from 6.0-RELEASE sources dated Dec 9) running on both machines,
so we'll see.


When it hangs, break into DDB (Ctrl-Alt-Esc on the console or BREAK on
a serial console).

As a start, run 'show lockedvnods' and 'ps'.  My guess is that you'll
see a lock that has a number of waiters - which is probably the
culprit.  Use 'panic' or 'call doadump' to get a crashdump and then
you can use kgdb to rummage around once you reboot - see
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebg-gdb.html


I don't have any experience in chasing kernel bugs, so I'm not sure
whether I would be able to get something useful, but I'll try that on 
the next crash. But if I have no keyboard response I won't be able to 
save it, right?


I do not know what a serial console is and would need some time to get 
along with it. Would I get something in addition to what I can get from 
the standard console?



 makeoptions   DEBUG=-g # Build kernel with gdb(1) debug symbols


I suggest you add this back in.  Without it, you can't debug any crash
dumps that you manage to get (and add dumpdev to your rc.conf).


My bad, I realized that it's kind of harmless, but it was weeks later
after I put the box in production. It's back there now.

The dumpdev variable seems to default to AUTO, i.e. trying to use the 
first swap device if it's bigger than the RAM (in my case yes), so I 
guess I don't need to touch it.



Whilst I realise that you can't have production machines freezing on
schedule, your assistance in providing more information about your
problem will help make 6.x more stable.


Yes, I know and I will try. Today I already had a couple of crashes
(got lucky, no nasty data corruptions this time), and I cannot afford 
this to continue.


I'm already working on the downgrade, but most likely I will have at 
least one of these 2 machines still running 6.x during the next day or two.


After the downgrade we could eventually set a test bed and start 
hammering it with requests. The problem would be how to trigger the 
crash and whether we would be able to reproduce it at all.


Thanks for the prompt reply!

Regards,
Atanas
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.0 random freezes

2005-12-12 Thread Peter Jeremy
On Mon, 2005-Dec-12 22:21:52 -0400, fredthetree wrote:
I just wanted to chime in and say I've had my 6.0-RELEASE #0 freeze up twice
in the past few days.  never once had it happen with 5.x.  everything locks,
no keyboard response, no mouse, and after several minutes it reboots itself,
and savecore starts up during boot..

This suggests you've had a panic (or something that develops into
one).  If you've got a crashdump, you can probably get enough
information out of it for people to get an idea of what is wrong.
Please have a look at:
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebg-gdb.html

-- 
Peter Jeremy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 6.0 random freezes

2005-12-12 Thread Atanas

Atanas said the following on 12/12/05 15:43:

Ronald Klop said the following on 12/12/05 13:27:


What happens if you set one of these sysctl values to 0? (This 
disables  SMP changes from 5.4 to 6.0.)

debug.mpsafevfs: 1
debug.mpsafenet: 1
debug.mpsafevm: 1


Thanks for the suggestion!
I just did so and rebooted both machines, so we'll see.


(replying to myself)
... and coincidentally or not, I got the next crash in less than 10 
minutes :-(


After the crash it ran for longer, until I rebooted it after rebuilding 
the kernel with debug hookups. Before the reboot I commented these out 
(i.e. set them back to 1), and now I'm waiting for a crashdump.


Regards,
Atanas
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]