Re: -current lockups
On Tue, 31 Jul 2001, John Baldwin wrote: On 31-Jul-01 Vincent Poy wrote: On Mon, 30 Jul 2001, John Baldwin wrote: On 30-Jul-01 Sheldon Hearn wrote: On Mon, 30 Jul 2001 07:38:47 MST, David O'Brien wrote: However, those boxes were panicing often before I made that statement. So I still believe current is now in better shape than it was in June. I'll be a lot happier when I can enabled DDB_UNATTENDED and do whatever it is that causes my panic of the day and actually get a crashdump instead of panic: witness_restore: lock (sleep mutex) Giant not locked This is a different one. Is this during the dump itself? That I can try to work on. (Basically, I need to make witness just stop doing all of its various checks if panicstr != NULL). I'm getting the following lock order reversal for any -current since July 19, 2001 including today and it just hangs solid after this, no db prompt or anything... It only happens after passwd or chpass successfully rebuilds the database, vipw works fine. root@pele [9:29pm][/usr/temp] Jul 28 21:29:40 pele /boot/kernel/kernel: lock order reversal Jul 28 21:29:40 pele /boot/kernel/kernel: lock order reversal Jul 28 21:29:40 pele /boot/kernel/kernel: 1st 0xd92fea9c process lock @ /usr/src/sys/vm/vm_glue.c:469 Jul 28 21:29:40 pele /boot/kernel/kernel: 1st 0xd92fea9c process lock @ /usr/src/sys/vm/vm_glue.c:469 Jul 28 21:29:40 pele /boot/kernel/kernel: 2nd 0xc118dfb0 lockmgr interlock @ /usr/src/sys/kern/kern_lock.c:239 Jul 28 21:29:40 pele /boot/kernel/kernel: 2nd 0xc118dfb0 lockmgr interlock @ /usr/src/sys/kern/kern_lock.c:239 This is due to the way that lockmgr locks are implemented unfortunately, and will be fixed when vm maps switch to sx locks instead of lockmgr locks. Just a note to say thanks to John Baldwin, Peter Wemm, Ian Dowse and a few others for all their hard work and code commits since the panics from both stability and running passwd have completely disappeared. The system is solid as a rock! Thanks guys! Cheers, Vince - [EMAIL PROTECTED] - Vice President __ Unix Networking Operations - FreeBSD-Real Unix for Free / / / / | / |[__ ] WurldLink Corporation / / / / | / | __] ] San Francisco - Honolulu - Hong Kong / / / / / |/ / | __] ] HongKong Stars/Gravis UltraSound Mailing Lists Admin /_/_/_/_/|___/|_|[] Almighty1@IRC - oahu.DAL.NET Hawaii's DALnet IRC Network Server Admin To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: -current lockups
On Tue, 31 Jul 2001 12:06:49 -1000, Vincent Poy wrote: Yeah, that's the weird part... I thought adding a DDB_UNATTENDED as a option would atleast make it reboot or something... For the record, DDB_UNATTENDED is mostly pointless. It just sets the default value of debug.debugger_on_panic, which you can just as well set in /etc/sysctl.conf. Unless, of course, you're seeing a panic in the startup process. But then do you really want an indefinite panic cycle? :-) Ciao, Sheldon. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: -current lockups
On Wed, 1 Aug 2001, Sheldon Hearn wrote: : : :On Tue, 31 Jul 2001 12:06:49 -1000, Vincent Poy wrote: : : Yeah, that's the weird part... I thought adding a DDB_UNATTENDED : as a option would atleast make it reboot or something... : :For the record, DDB_UNATTENDED is mostly pointless. It just sets the :default value of debug.debugger_on_panic, which you can just as well set :in /etc/sysctl.conf. Unless, of course, you're seeing a panic in the :startup process. But then do you really want an indefinite panic cycle? ::-) Well, my current startup panic only happens at cold boot. After it panics the first time, it boots fine. If DDB_UNATTENED isn't set, it hangs trying to enter DDB. -- [EMAIL PROTECTED] Bipedalism is only a fad. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: -current lockups
On Tue, 31 Jul 2001, John Baldwin wrote: On 31-Jul-01 Vincent Poy wrote: On Mon, 30 Jul 2001, John Baldwin wrote: On 30-Jul-01 Sheldon Hearn wrote: On Mon, 30 Jul 2001 07:38:47 MST, David O'Brien wrote: However, those boxes were panicing often before I made that statement. So I still believe current is now in better shape than it was in June. I'll be a lot happier when I can enabled DDB_UNATTENDED and do whatever it is that causes my panic of the day and actually get a crashdump instead of panic: witness_restore: lock (sleep mutex) Giant not locked This is a different one. Is this during the dump itself? That I can try to work on. (Basically, I need to make witness just stop doing all of its various checks if panicstr != NULL). I'm getting the following lock order reversal for any -current since July 19, 2001 including today and it just hangs solid after this, no db prompt or anything... It only happens after passwd or chpass successfully rebuilds the database, vipw works fine. root@pele [9:29pm][/usr/temp] Jul 28 21:29:40 pele /boot/kernel/kernel: lock order reversal Jul 28 21:29:40 pele /boot/kernel/kernel: lock order reversal Jul 28 21:29:40 pele /boot/kernel/kernel: 1st 0xd92fea9c process lock @ /usr/src/sys/vm/vm_glue.c:469 Jul 28 21:29:40 pele /boot/kernel/kernel: 1st 0xd92fea9c process lock @ /usr/src/sys/vm/vm_glue.c:469 Jul 28 21:29:40 pele /boot/kernel/kernel: 2nd 0xc118dfb0 lockmgr interlock @ /usr/src/sys/kern/kern_lock.c:239 Jul 28 21:29:40 pele /boot/kernel/kernel: 2nd 0xc118dfb0 lockmgr interlock @ /usr/src/sys/kern/kern_lock.c:239 This is due to the way that lockmgr locks are implemented unfortunately, and will be fixed when vm maps switch to sx locks instead of lockmgr locks. Interesting. Is there a workaround so it just reboots instead of freezing? Also, I noticed that you committed some changes to the kernel, is that supposed to help it any? Cheers, Vince - [EMAIL PROTECTED] - Vice President __ Unix Networking Operations - FreeBSD-Real Unix for Free / / / / | / |[__ ] WurldLink Corporation / / / / | / | __] ] San Francisco - Honolulu - Hong Kong / / / / / |/ / | __] ] HongKong Stars/Gravis UltraSound Mailing Lists Admin /_/_/_/_/|___/|_|[] Almighty1@IRC - oahu.DAL.NET Hawaii's DALnet IRC Network Server Admin To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: -current lockups
On 31-Jul-01 Vincent Poy wrote: On Tue, 31 Jul 2001, John Baldwin wrote: root@pele [9:29pm][/usr/temp] Jul 28 21:29:40 pele /boot/kernel/kernel: lock order reversal Jul 28 21:29:40 pele /boot/kernel/kernel: lock order reversal Jul 28 21:29:40 pele /boot/kernel/kernel: 1st 0xd92fea9c process lock @ /usr/src/sys/vm/vm_glue.c:469 Jul 28 21:29:40 pele /boot/kernel/kernel: 1st 0xd92fea9c process lock @ /usr/src/sys/vm/vm_glue.c:469 Jul 28 21:29:40 pele /boot/kernel/kernel: 2nd 0xc118dfb0 lockmgr interlock @ /usr/src/sys/kern/kern_lock.c:239 Jul 28 21:29:40 pele /boot/kernel/kernel: 2nd 0xc118dfb0 lockmgr interlock @ /usr/src/sys/kern/kern_lock.c:239 This is due to the way that lockmgr locks are implemented unfortunately, and will be fixed when vm maps switch to sx locks instead of lockmgr locks. Interesting. Is there a workaround so it just reboots instead of freezing? Also, I noticed that you committed some changes to the kernel, is that supposed to help it any? There is currently not a workaround. The changes committed fix other things, but not this problem. I haven't actually seen this lock order cause a freeze before to be honest. -- John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc Power Users Use the Power to Serve! - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: -current lockups
On Tue, 31 Jul 2001, John Baldwin wrote: On 31-Jul-01 Vincent Poy wrote: On Tue, 31 Jul 2001, John Baldwin wrote: root@pele [9:29pm][/usr/temp] Jul 28 21:29:40 pele /boot/kernel/kernel: lock order reversal Jul 28 21:29:40 pele /boot/kernel/kernel: lock order reversal Jul 28 21:29:40 pele /boot/kernel/kernel: 1st 0xd92fea9c process lock @ /usr/src/sys/vm/vm_glue.c:469 Jul 28 21:29:40 pele /boot/kernel/kernel: 1st 0xd92fea9c process lock @ /usr/src/sys/vm/vm_glue.c:469 Jul 28 21:29:40 pele /boot/kernel/kernel: 2nd 0xc118dfb0 lockmgr interlock @ /usr/src/sys/kern/kern_lock.c:239 Jul 28 21:29:40 pele /boot/kernel/kernel: 2nd 0xc118dfb0 lockmgr interlock @ /usr/src/sys/kern/kern_lock.c:239 This is due to the way that lockmgr locks are implemented unfortunately, and will be fixed when vm maps switch to sx locks instead of lockmgr locks. Interesting. Is there a workaround so it just reboots instead of freezing? Also, I noticed that you committed some changes to the kernel, is that supposed to help it any? There is currently not a workaround. The changes committed fix other things, but not this problem. I haven't actually seen this lock order cause a freeze before to be honest. Yeah, that's the weird part... I thought adding a DDB_UNATTENDED as a option would atleast make it reboot or something... Cheers, Vince - [EMAIL PROTECTED] - Vice President __ Unix Networking Operations - FreeBSD-Real Unix for Free / / / / | / |[__ ] WurldLink Corporation / / / / | / | __] ] San Francisco - Honolulu - Hong Kong / / / / / |/ / | __] ] HongKong Stars/Gravis UltraSound Mailing Lists Admin /_/_/_/_/|___/|_|[] Almighty1@IRC - oahu.DAL.NET Hawaii's DALnet IRC Network Server Admin To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: -current lockups
On 31-Jul-01 Vincent Poy wrote: On Tue, 31 Jul 2001, John Baldwin wrote: On 31-Jul-01 Vincent Poy wrote: On Tue, 31 Jul 2001, John Baldwin wrote: root@pele [9:29pm][/usr/temp] Jul 28 21:29:40 pele /boot/kernel/kernel: lock order reversal Jul 28 21:29:40 pele /boot/kernel/kernel: lock order reversal Jul 28 21:29:40 pele /boot/kernel/kernel: 1st 0xd92fea9c process lock @ /usr/src/sys/vm/vm_glue.c:469 Jul 28 21:29:40 pele /boot/kernel/kernel: 1st 0xd92fea9c process lock @ /usr/src/sys/vm/vm_glue.c:469 Jul 28 21:29:40 pele /boot/kernel/kernel: 2nd 0xc118dfb0 lockmgr interlock @ /usr/src/sys/kern/kern_lock.c:239 Jul 28 21:29:40 pele /boot/kernel/kernel: 2nd 0xc118dfb0 lockmgr interlock @ /usr/src/sys/kern/kern_lock.c:239 This is due to the way that lockmgr locks are implemented unfortunately, and will be fixed when vm maps switch to sx locks instead of lockmgr locks. Interesting. Is there a workaround so it just reboots instead of freezing? Also, I noticed that you committed some changes to the kernel, is that supposed to help it any? There is currently not a workaround. The changes committed fix other things, but not this problem. I haven't actually seen this lock order cause a freeze before to be honest. Yeah, that's the weird part... I thought adding a DDB_UNATTENDED as a option would atleast make it reboot or something... Well, since it is a lock order reversal, there is the chance of it resulting in a deadlock though the chances of that on a UP machine would be very, very rare indeed. The reversal in question is triggered when we swap a process out. -- John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc Power Users Use the Power to Serve! - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: -current lockups
On Tue, 31 Jul 2001, John Baldwin wrote: On 31-Jul-01 Vincent Poy wrote: On Tue, 31 Jul 2001, John Baldwin wrote: root@pele [9:29pm][/usr/temp] Jul 28 21:29:40 pele /boot/kernel/kernel: lock order reversal Jul 28 21:29:40 pele /boot/kernel/kernel: lock order reversal Jul 28 21:29:40 pele /boot/kernel/kernel: 1st 0xd92fea9c process lock @ /usr/src/sys/vm/vm_glue.c:469 Jul 28 21:29:40 pele /boot/kernel/kernel: 1st 0xd92fea9c process lock @ /usr/src/sys/vm/vm_glue.c:469 Jul 28 21:29:40 pele /boot/kernel/kernel: 2nd 0xc118dfb0 lockmgr interlock @ /usr/src/sys/kern/kern_lock.c:239 Jul 28 21:29:40 pele /boot/kernel/kernel: 2nd 0xc118dfb0 lockmgr interlock @ /usr/src/sys/kern/kern_lock.c:239 This is due to the way that lockmgr locks are implemented unfortunately, and will be fixed when vm maps switch to sx locks instead of lockmgr locks. Interesting. Is there a workaround so it just reboots instead of freezing? Also, I noticed that you committed some changes to the kernel, is that supposed to help it any? There is currently not a workaround. The changes committed fix other things, but not this problem. I haven't actually seen this lock order cause a freeze before to be honest. Yeah, that's the weird part... I thought adding a DDB_UNATTENDED as a option would atleast make it reboot or something... Well, since it is a lock order reversal, there is the chance of it resulting in a deadlock though the chances of that on a UP machine would be very, very rare indeed. The reversal in question is triggered when we swap a process out. Yep, it's so rare that nothing can trigger it except for passwd and chpass after they successfully exit and do the following successfully... passwd: updating the database... passwd: done Even vipw doesn't trigger it which I thought it would as it would do all the users rather than just one. Cheers, Vince - [EMAIL PROTECTED] - Vice President __ Unix Networking Operations - FreeBSD-Real Unix for Free / / / / | / |[__ ] WurldLink Corporation / / / / | / | __] ] San Francisco - Honolulu - Hong Kong / / / / / |/ / | __] ] HongKong Stars/Gravis UltraSound Mailing Lists Admin /_/_/_/_/|___/|_|[] Almighty1@IRC - oahu.DAL.NET Hawaii's DALnet IRC Network Server Admin To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: -current lockups
On Mon, 30 Jul 2001, John Baldwin wrote: On 30-Jul-01 Sheldon Hearn wrote: On Mon, 30 Jul 2001 07:38:47 MST, David O'Brien wrote: However, those boxes were panicing often before I made that statement. So I still believe current is now in better shape than it was in June. I'll be a lot happier when I can enabled DDB_UNATTENDED and do whatever it is that causes my panic of the day and actually get a crashdump instead of panic: witness_restore: lock (sleep mutex) Giant not locked This is a different one. Is this during the dump itself? That I can try to work on. (Basically, I need to make witness just stop doing all of its various checks if panicstr != NULL). I'm getting the following lock order reversal for any -current since July 19, 2001 including today and it just hangs solid after this, no db prompt or anything... It only happens after passwd or chpass successfully rebuilds the database, vipw works fine. root@pele [9:29pm][/usr/temp] Jul 28 21:29:40 pele /boot/kernel/kernel: lock order reversal Jul 28 21:29:40 pele /boot/kernel/kernel: lock order reversal Jul 28 21:29:40 pele /boot/kernel/kernel: 1st 0xd92fea9c process lock @ /usr/src/sys/vm/vm_glue.c:469 Jul 28 21:29:40 pele /boot/kernel/kernel: 1st 0xd92fea9c process lock @ /usr/src/sys/vm/vm_glue.c:469 Jul 28 21:29:40 pele /boot/kernel/kernel: 2nd 0xc118dfb0 lockmgr interlock @ /usr/src/sys/kern/kern_lock.c:239 Jul 28 21:29:40 pele /boot/kernel/kernel: 2nd 0xc118dfb0 lockmgr interlock @ /usr/src/sys/kern/kern_lock.c:239 Cheers, Vince - [EMAIL PROTECTED] - Vice President __ Unix Networking Operations - FreeBSD-Real Unix for Free / / / / | / |[__ ] WurldLink Corporation / / / / | / | __] ] San Francisco - Honolulu - Hong Kong / / / / / |/ / | __] ] HongKong Stars/Gravis UltraSound Mailing Lists Admin /_/_/_/_/|___/|_|[] Almighty1@IRC - oahu.DAL.NET Hawaii's DALnet IRC Network Server Admin To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: -current lockups
On Mon, 30 Jul 2001 16:52:27 +0200, Sheldon Hearn wrote: I'll be a lot happier when I can enabled DDB_UNATTENDED and do whatever it is that causes my panic of the day and actually get a crashdump instead of panic: witness_restore: lock (sleep mutex) Giant not locked Right! We have interesting progress! The following patchset fixes two problems: 1) The witness code may still panic when panicstr is not NULL. This is a problem when the original panic was caused by the witness code itself. Solution: when panicstr != NULL, witness backs off on the fascism. 2) The ata code calls await/mawait inside a dump. This results in a hard, uninterruptable lock in ad_reinit(). An NMI might interrupt it, but not all of us have NMI boards. Solution: as a quick fix, when panicstr != NULL, bail out of masleep() without spinning on mutexes. The real solution here is arguably for the ata code to be aware of the fact that sleeping inside a dump (i.e. panicstr != NULL) is bad and stop doing so. With these fixes in place, I can get a crashdump with a populated ktr_buf from a witness-related panic. Joy! Many thanks to jhb for providing the patch for #1 above and to peter for the patch for #2 above. None of this is actually my own work. I'm just the guy pressing the panic button on his box incessantly. :-) Ciao, Sheldon. Index: kern/kern_mutex.c === RCS file: /home/ncvs/src/sys/kern/kern_mutex.c,v retrieving revision 1.64 diff -u -d -r1.64 kern_mutex.c --- kern/kern_mutex.c 2001/06/25 18:29:32 1.64 +++ kern/kern_mutex.c 2001/07/30 23:14:32 @@ -562,6 +562,9 @@ void _mtx_assert(struct mtx *m, int what, const char *file, int line) { + + if (panicstr != NULL) + return; switch (what) { case MA_OWNED: case MA_OWNED | MA_RECURSED: Index: kern/kern_synch.c === RCS file: /home/ncvs/src/sys/kern/kern_synch.c,v retrieving revision 1.148 diff -u -d -r1.148 kern_synch.c --- kern/kern_synch.c 2001/07/06 01:16:42 1.148 +++ kern/kern_synch.c 2001/07/31 01:07:23 @@ -554,6 +554,18 @@ KASSERT(timo 0 || mtx_owned(Giant) || mtx != NULL, (sleeping without a mutex)); mtx_lock_spin(sched_lock); + if (cold || panicstr) { + /* +* After a panic, or during autoconfiguration, +* just give interrupts a chance, then just return; +* don't run any other procs or panic below, +* in case this is the idle process and already asleep. +*/ + if (mtx != NULL priority PDROP) + mtx_unlock_flags(mtx, MTX_NOSWITCH); + mtx_unlock_spin(sched_lock); + return (0); + } DROP_GIANT_NOSWITCH(); if (mtx != NULL) { mtx_assert(mtx, MA_OWNED | MA_NOTRECURSED); Index: kern/subr_trap.c === RCS file: /home/ncvs/src/sys/kern/subr_trap.c,v retrieving revision 1.196 diff -u -d -r1.196 subr_trap.c --- kern/subr_trap.c2001/07/04 15:36:30 1.196 +++ kern/subr_trap.c2001/07/30 23:14:42 @@ -72,9 +72,9 @@ while ((sig = CURSIG(p)) != 0) postsig(sig); mtx_unlock(Giant); + PROC_UNLOCK(p); mtx_lock_spin(sched_lock); - PROC_UNLOCK_NOSWITCH(p); p-p_pri.pri_level = p-p_pri.pri_user; if (resched_wanted(p)) { /* @@ -96,24 +96,22 @@ while ((sig = CURSIG(p)) != 0) postsig(sig); mtx_unlock(Giant); - mtx_lock_spin(sched_lock); - PROC_UNLOCK_NOSWITCH(p); - } + PROC_UNLOCK(p); + } else + mtx_unlock_spin(sched_lock); /* * Charge system time if profiling. */ - if (p-p_sflag PS_PROFIL) { - mtx_unlock_spin(sched_lock); + if (p-p_sflag PS_PROFIL) addupc_task(p, TRAPF_PC(frame), (u_int)(p-p_sticks - oticks) * psratio); - } else - mtx_unlock_spin(sched_lock); } /* * Process an asynchronous software trap. * This is relatively easy. + * This function will return with interrupts disabled. */ void ast(framep) @@ -121,68 +119,64 @@ { struct proc *p = CURPROC; u_quad_t sticks; + critical_t s; + int sflag; #if defined(DEV_NPX) !defined(SMP) int ucode; #endif KASSERT(TRAPF_USERMODE(framep), (ast in kernel mode)); - - /* -* We check for a pending AST here rather than in the assembly as -* acquiring and releasing mutexes in assembly is not fun. -*/ - mtx_lock_spin(sched_lock); - if (!(astpending(p) || resched_wanted(p))) { -
Re: -current lockups
On 31-Jul-01 Vincent Poy wrote: On Mon, 30 Jul 2001, John Baldwin wrote: On 30-Jul-01 Sheldon Hearn wrote: On Mon, 30 Jul 2001 07:38:47 MST, David O'Brien wrote: However, those boxes were panicing often before I made that statement. So I still believe current is now in better shape than it was in June. I'll be a lot happier when I can enabled DDB_UNATTENDED and do whatever it is that causes my panic of the day and actually get a crashdump instead of panic: witness_restore: lock (sleep mutex) Giant not locked This is a different one. Is this during the dump itself? That I can try to work on. (Basically, I need to make witness just stop doing all of its various checks if panicstr != NULL). I'm getting the following lock order reversal for any -current since July 19, 2001 including today and it just hangs solid after this, no db prompt or anything... It only happens after passwd or chpass successfully rebuilds the database, vipw works fine. root@pele [9:29pm][/usr/temp] Jul 28 21:29:40 pele /boot/kernel/kernel: lock order reversal Jul 28 21:29:40 pele /boot/kernel/kernel: lock order reversal Jul 28 21:29:40 pele /boot/kernel/kernel: 1st 0xd92fea9c process lock @ /usr/src/sys/vm/vm_glue.c:469 Jul 28 21:29:40 pele /boot/kernel/kernel: 1st 0xd92fea9c process lock @ /usr/src/sys/vm/vm_glue.c:469 Jul 28 21:29:40 pele /boot/kernel/kernel: 2nd 0xc118dfb0 lockmgr interlock @ /usr/src/sys/kern/kern_lock.c:239 Jul 28 21:29:40 pele /boot/kernel/kernel: 2nd 0xc118dfb0 lockmgr interlock @ /usr/src/sys/kern/kern_lock.c:239 This is due to the way that lockmgr locks are implemented unfortunately, and will be fixed when vm maps switch to sx locks instead of lockmgr locks. -- John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc Power Users Use the Power to Serve! - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: -current lockups
On Sun, Jul 29, 2001 at 07:00:11PM -0700, Kris Kennaway wrote: For the past 2 or 3 weeks my -current system has been experiencing temporary lockups, usually under disk load. The entire system will hang for around 20-30 seconds, during which time absolutely no network/IO/keyboard/mouse activity is accepted. Usually, after 20-30 seconds the system will unwedge and activity will resume, but sometimes it hangs forever. There are no console messages logged by this event. I cannot break into DDB until after system activity resumes normally. I am also experiencing total wedging on disk activity (vi foo, was one) on a SCSI system since I updated late last week. My May 7th kernel was rock solid. -- -- David ([EMAIL PROTECTED]) To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: -current lockups
On Mon, 30 Jul 2001 09:28:09 MST, John Baldwin wrote: panic: witness_restore: lock (sleep mutex) Giant not locked This is a different one. Is this during the dump itself? That I can try to work on. (Basically, I need to make witness just stop doing all of its various checks if panicstr != NULL). Oh cool! Yes, this is during the dump. I get the witness panic, following by something along the lines of dump already in progress, bailing. Ciao, Sheldon. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: -current lockups
On Mon, 30 Jul 2001 07:26:55 MST, David O'Brien wrote: I am also experiencing total wedging on disk activity (vi foo, was one) on a SCSI system since I updated late last week. My May 7th kernel was rock solid. Was this before or after you posted publically that -CURRENT seemed stable and that now is a good time to upgrade if you've been holding back? :-) Ciao, Sheldon. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: -current lockups
On Mon, 30 Jul 2001 07:38:47 MST, David O'Brien wrote: However, those boxes were panicing often before I made that statement. So I still believe current is now in better shape than it was in June. I'll be a lot happier when I can enabled DDB_UNATTENDED and do whatever it is that causes my panic of the day and actually get a crashdump instead of panic: witness_restore: lock (sleep mutex) Giant not locked :-) Fortunately, jhb has said he'll try take a look at this some time this week. However, if I hadn't interacted with the guy directly, I'd be pretty frustrated with -CURRENT. Ciao, Sheldon. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: -current lockups
This has happened on and off for various platforms since SMPNG. On Sun, 29 Jul 2001, Kris Kennaway wrote: Hi all, For the past 2 or 3 weeks my -current system has been experiencing temporary lockups, usually under disk load. The entire system will hang for around 20-30 seconds, during which time absolutely no network/IO/keyboard/mouse activity is accepted. Usually, after 20-30 seconds the system will unwedge and activity will resume, but sometimes it hangs forever. There are no console messages logged by this event. I cannot break into DDB until after system activity resumes normally. My system is a PPro 233 using IDE drives. Has anyone else seen something like this? Kris To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: current lockups
Compiling Mozilla with make -j 2 got -current to lock up, twice in succession. I'm running a fairly recent snapshot (a week or two old) on a Dual celeron box (BP6) with UDMA66 enabled. The kernel had DDB enabled. I was running X, but I didn't see any signs of the kernel attempting to get into the debugger. Has this been fixed ? Is anyone interested in investigating ? I'll post more info if I find anything. Another data point: I had another lockup today. I left the box to do a buildworld and went out for dinner. When I'd returned, the machine had locked up tight, but the orange LED on the disk was on. Now, I don't know if the problem is my WDC 20 GB disk or something else in the ATA driver. I'm running on the RELENG_4 branch as of yesterday night. -Arun To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: current lockups
On 2000-Mar-09 10:05:21 +1100, Peter Dufault [EMAIL PROTECTED] wrote: There's no difference between rtprio and P1003.1B scheduling other than the name. rtprio is the same as P1003.1B "SCHED_RR". I wasn't aware of that. I'd like to remove the rtprio call from ntpd. I think we ought to do it now before 4.0 ships. Given there is a known a priority inversion bug related to realtime (or idle) scheduling, it would seem wise not to use it in any system utilities. The relevant patch would appear to be (untested): --- /usr/src/usr.sbin/ntp/config.hTue Feb 1 13:56:05 2000 +++ /tmp/config.h Thu Mar 9 11:46:11 2000 You have to do something in the "./configure" stuff. Hopefully someone in the know can suggest the "--with-no-foobar" option needed on the command line so I don't have to wade into it. Autoconfiguring POSIX realtime is a bad, bad idea because: 1. You don't know if it is available in all environments; 2. You don't know who is allowed to use it; 3. You don't know what the heck it does. It decidedly does not mean "run as fast as you can". Peter -- Peter Dufault ([EMAIL PROTECTED]) Realtime development, Machine control, HD Associates, Inc. Fail-Safe systems, Agency approval To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: current lockups
It is rumoured that Vallo Kallaste had the courage to say: I had a lockup yesterday while stress-testing new SMP machine. Tyan motherboard with Intel GX chipset, 256MB of memory, one 20GB IBM UDMA66 disk, but running at UDMA33. All power management disabled completely in the BIOS. I was doing massive parallel compiling of GENERIC kernels. Let the machine doing this overnight and on the morning the console had about 20 'microuptime() went backwards' messages, I was able to switch vty's but not login, machine responded to pings, no disk activity. I'm using ata driver and only one unusual kernel option HZ=1000. Your symptoms are not the same as mine. In my case the lockups are complete. No switching of vt's, no pings, nothing at all. I never saw any "microuptime() went backwards" messages either. But then again, I never had the machine lockup on the console; I was usually logged in over the network or working in X. Regards, Dave Boers. -- Dave Boers djb @ relativity . student . utwente . nl Don't let your schooling interfere with your education. (Mark Twain) To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: current lockups
On 2000-Mar-07 06:29:17 +1100, Dave Boers [EMAIL PROTECTED] wrote: It is rumoured that Arun Sharma had the courage to say: Compiling Mozilla with make -j 2 got -current to lock up, twice in succession. I'm running a fairly recent snapshot (a week or two old) on a Dual celeron box (BP6) with UDMA66 enabled. Finally. I've been complaining about this on several occasions. I'm also running UDMA66 and Dual Celeron BP6. No overclocking. Later postings mention possible problems with UDMA66. The other possibility that has been discussed recently is potential priority inversions for processes using rtptio and idprio. Note that ntpd will use rtprio if the Posix P1003.1b extensions aren't enabled in the kernel. (These were enabled by default in GENERIC on i386 in mid-January). If you have the new ntpd (rather than xntpd) and are running a kernel without options P1003_1B, _KPOSIX_PRIORITY_SCHEDULING and _KPOSIX_VERSION=199309L, you could potentially get a lockup due to a priority inversion. (Though I think the probability is very small). Peter To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: current lockups
:On 2000-Mar-07 06:29:17 +1100, Dave Boers [EMAIL PROTECTED] wrote: :It is rumoured that Arun Sharma had the courage to say: : Compiling Mozilla with make -j 2 got -current to lock up, twice in : succession. I'm running a fairly recent snapshot (a week or two old) : on a Dual celeron box (BP6) with UDMA66 enabled. : :Finally. I've been complaining about this on several occasions. I'm also :running UDMA66 and Dual Celeron BP6. No overclocking. : :Later postings mention possible problems with UDMA66. The other :possibility that has been discussed recently is potential priority :inversions for processes using rtptio and idprio. : :Note that ntpd will use rtprio if the Posix P1003.1b extensions aren't :enabled in the kernel. (These were enabled by default in GENERIC on :i386 in mid-January). If you have the new ntpd (rather than xntpd) :and are running a kernel without options P1003_1B, :_KPOSIX_PRIORITY_SCHEDULING and _KPOSIX_VERSION=199309L, you could :potentially get a lockup due to a priority inversion. (Though I :think the probability is very small). : :Peter p.s. the first thing anyone having potential IDE problems should do is try the older 'wd' driver (if it supports their chipset) and see if that solves the problem. At least then we can focus on the precise location of the problem. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: current lockups
On Mon, Mar 06, 2000 at 08:27:18PM +0100, Dave Boers [EMAIL PROTECTED] wrote: I'm interested in the fix, of course :-) But where to start looking? I've had three lockups so far (none before january 2000) but I didn't find anything that reliably triggered it. I had a lockup yesterday while stress-testing new SMP machine. Tyan motherboard with Intel GX chipset, 256MB of memory, one 20GB IBM UDMA66 disk, but running at UDMA33. All power management disabled completely in the BIOS. I was doing massive parallel compiling of GENERIC kernels. Let the machine doing this overnight and on the morning the console had about 20 'microuptime() went backwards' messages, I was able to switch vty's but not login, machine responded to pings, no disk activity. I'm using ata driver and only one unusual kernel option HZ=1000. -- Vallo Kallaste [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: current lockups
I'll second this email... My computer had been stable all winter (with setiathome runnning full time) but suddenly come the Australian summer it started freezing. Not panicing, just totally freezing under load. I could reproduce it by trying to build the whole of KDE and each time it was a freeze, never a panic. Windows 98 was freezing too but I didn't think that was abormal ;/ It turned out to be heat related as the machine is now stable after I installed a case fan (before I only had the power supply fan and CPU fan). I see that the internal case temperature still gets up to about 50 or 51 degrees celcius whereas it was getting to 52 degrees before. Note that I AM overclocking a Celeron 300a to 450 MHz by running with a 100 MHz FSB instead of 66 MHz so I suppose I shouldn't be surprised at the need for better cooling. As I'd prefer better CPU cooling to the case fan on the grounds of noise, can people recommend good CPU fans (over the standard Intel retail version Celeron 300a fan) ? How about these Peltier (sp ?) cooling devices I have heard about ? On Sun, 5 Mar 2000, Dan Papasian wrote: 1. Is your computer overclocked? 2. Is the computer totally frozen? (i.e. scroll lock doesn't turn the light on) 3. Does similar load crash the box as well? (try make -j2 world) 4. Does it freeze in the same spot? 5. Is the computer not responding to pings? If you've answered yes to a good amount of these questions, there is a good chance that your processor(s) are overheating. Try improving the airflow to the case (But using a household fan isn't recommended due to EMI) -Dan Papasian [EMAIL PROTECTED] On Sat, Mar 04, 2000 at 11:50:10PM -0800, Arun Sharma wrote: Compiling Mozilla with make -j 2 got -current to lock up, twice in succession. I'm running a fairly recent snapshot (a week or two old) on a Dual celeron box (BP6) with UDMA66 enabled. The kernel had DDB enabled. I was running X, but I didn't see any signs of the kernel attempting to get into the debugger. Has this been fixed ? Is anyone interested in investigating ? I'll post more info if I find anything. -Arun To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: current lockups
It is rumoured that Arun Sharma had the courage to say: Compiling Mozilla with make -j 2 got -current to lock up, twice in succession. I'm running a fairly recent snapshot (a week or two old) on a Dual celeron box (BP6) with UDMA66 enabled. Finally. I've been complaining about this on several occasions. I'm also running UDMA66 and Dual Celeron BP6. No overclocking. The kernel had DDB enabled. I was running X, but I didn't see any signs of the kernel attempting to get into the debugger. Ditto here. Has this been fixed ? Is anyone interested in investigating ? I'll post more info if I find anything. I'm interested in the fix, of course :-) But where to start looking? I've had three lockups so far (none before january 2000) but I didn't find anything that reliably triggered it. Regards, Dave. -- Dave Boers djb @ relativity . student . utwente . nl Don't let your schooling interfere with your education. (Mark Twain) To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: current lockups
On Mon, Mar 06, 2000 at 08:27:18PM +0100, Dave Boers wrote: Has this been fixed ? Is anyone interested in investigating ? I'll post more info if I find anything. I'm interested in the fix, of course :-) But where to start looking? I've had three lockups so far (none before january 2000) but I didn't find anything that reliably triggered it. The cooling theory sounds the most plausible so far. I'm not over clocking my CPUs (Celeron 366s) and have appropriate cooling installed. But the machine is kept in a small room, with a bunch of other machines and gets a bit warm at times. There has been no reproducible case of locking up. Each one looks different. But most were trigerred by heavy compilation and I/O. One was a lockup overnight with no activity on the system. When it happens, it does not respond to pings or scroll lock. If you'd like to do something about it, working on getting a reproducible hang would be the most beneficial one. -Arun To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: current lockups
It is rumoured that Arun Sharma had the courage to say: The cooling theory sounds the most plausible so far. I'm not over clocking my CPUs (Celeron 366s) and have appropriate cooling installed. But the machine is kept in a small room, with a bunch of other machines and gets a bit warm at times. My system has been 50 degrees Celcius for the past half year or so. Yet, the lockups only started occurring around January 2000. Once again, my system is not overclocked and the temperature is well within Intel's and Abit's temperature specifications, so there shouldn't be hardware problems. There has been no reproducible case of locking up. Each one looks different. But most were trigerred by heavy compilation and I/O. One was a lockup overnight with no activity on the system. When it happens, it does not respond to pings or scroll lock. Most of my lockups occurred when the system was relatively idle. Mostly they happened only after 9 - 11 days of uptime. As you say, each one looks different and there doesn't seem to be a pattern to it. When it locks up, there is no response to the console, the network or the serial terminal. Only the reset button is obeyed. I have DDB in my kernel, but there's no getting into it. Also, no log messages of any kind from just before the lockups. If you'd like to do something about it, working on getting a reproducible hang would be the most beneficial one. That's what I have been trying to do for the past few weeks, but I can't seem to trigger it. Uptime is now 2 days and I intend to let it run to 12 or so before make installworld again, to see if I can reproduce it. However, I did recently change from UDMA66 to an U2W SCSI disk for my main partitions (/, /usr, /var, /tmp and swap). It may have impact on the situation and it is the reason for the short uptime. If the problem has gone away now, it might indicate something with the ATA driver. I'll keep you informed. So far, since the disk change I've been putting my system under some heavy load from time to time (like building three large ports and make -j 12 buildworld at the same time). So far, the system is quite stable. Regards, Dave Boers. -- Dave Boers djb @ relativity . student . utwente . nl Don't let your schooling interfere with your education. (Mark Twain) To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: current lockups
The cooling theory sounds the most plausible so far. I'm not over clocking my CPUs (Celeron 366s) and have appropriate cooling installed. But the machine is kept in a small room, with a bunch of other machines and gets a bit warm at times. I have seen a couple of suggestions that this may not be the CPUs - but that the 82443BX chip (the one with the large green cooling fin) doesn't always get sufficient cooling on a BP6 board. Some thermal compound between the 82443BX and the cooling fin may be a good idea. Steinar Haug, Nethelp consulting, [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: current lockups
On Mon, Mar 06, 2000 at 08:27:18PM +0100, Dave Boers wrote: on a Dual celeron box (BP6) with UDMA66 enabled. Finally. I've been complaining about this on several occasions. I'm also running UDMA66 and Dual Celeron BP6. No overclocking. Can you people reproduce this on a kernel without SMP enabled? Perhaps there is a locking issue? However, that'd lead to a panic I'd imagine.. So see if you can reproduce this with one CPU running so we can at least eliminate one of the variables. -Dan Papasian [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: current lockups
I'm willing to bet a nickel (perhaps more) you people are running non-IBM UDMA66 drives on that BP6. Seems that most UDMA66 drives are not actually UDMA66 compliant, and they only drives that have been reported successful on the BP6 are IBM. Try taking your HD's off the UDMA66 controller and put them on the Standard UDMA33 controllers, and it should clear things up. -- Marius Strom [EMAIL PROTECTED] Professional Geek/Unix System Administrator Alpha1 Internet http://www.alpha1.net http://www.marius.org/marius.pgp 0x42C74CBA *UPDATED PGP KEY 2/24/2000* In theory, there is no difference between theory and practice... ...In practice, there is a big difference. On Mon, 6 Mar 2000, Dan Papasian wrote: On Mon, Mar 06, 2000 at 08:27:18PM +0100, Dave Boers wrote: on a Dual celeron box (BP6) with UDMA66 enabled. Finally. I've been complaining about this on several occasions. I'm also running UDMA66 and Dual Celeron BP6. No overclocking. Can you people reproduce this on a kernel without SMP enabled? Perhaps there is a locking issue? However, that'd lead to a panic I'd imagine.. So see if you can reproduce this with one CPU running so we can at least eliminate one of the variables. -Dan Papasian [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: current lockups
It is rumoured that Marius Strom had the courage to say: I'm willing to bet a nickel (perhaps more) you people are running non-IBM UDMA66 drives on that BP6. Seems that most UDMA66 drives are not actually UDMA66 compliant, and they only drives that have been reported successful on the BP6 are IBM. Try taking your HD's off the UDMA66 controller and put them on the Standard UDMA33 controllers, and it should clear things up. I'm interested in the sources of your statement about IBM drivers vs. non IBM drives. In my case, I have a WD 18.2 Gb 7200 rpm disk which has been reported to be identical to the IBM 18.2 Gb 7200 rpm disk on more than one occasion. And by the way, my system has been running quite stable before January 2000 with the same disk on the same controller and the same mainboard. Regards, Dave Boers. -- Dave Boers djb @ relativity . student . utwente . nl Don't let your schooling interfere with your education. (Mark Twain) To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: current lockups
Dave, Well, there was a discussion a few weeks back with Soren Schmidt and a few others. I believe the conclusion was made that this occurred with most WD drives (interesting about the WD == IBM part, I did notice he mentioned that in -current a few weeks ago as well). I had a WD20 gig that would just hang, and a number of other people had similar problems. (Theirs would log "Lost Disk Contact" in the dmesg as their root dev wasn't a UDMA66 drive) Unfortunately, the discussions occurred while the mailing list archive was kaput (WD Drive on UDMA66? =]) so it's not archived where I can find it. Seems to only happen with the ata driver, IIRC. -- Marius Strom [EMAIL PROTECTED] Professional Geek/Unix System Administrator Alpha1 Internet http://www.alpha1.net http://www.marius.org/marius.pgp 0x42C74CBA *UPDATED PGP KEY 2/24/2000* In theory, there is no difference between theory and practice... ...In practice, there is a big difference. On Mon, 6 Mar 2000, Dave Boers wrote: It is rumoured that Marius Strom had the courage to say: I'm willing to bet a nickel (perhaps more) you people are running non-IBM UDMA66 drives on that BP6. Seems that most UDMA66 drives are not actually UDMA66 compliant, and they only drives that have been reported successful on the BP6 are IBM. Try taking your HD's off the UDMA66 controller and put them on the Standard UDMA33 controllers, and it should clear things up. I'm interested in the sources of your statement about IBM drivers vs. non IBM drives. In my case, I have a WD 18.2 Gb 7200 rpm disk which has been reported to be identical to the IBM 18.2 Gb 7200 rpm disk on more than one occasion. And by the way, my system has been running quite stable before January 2000 with the same disk on the same controller and the same mainboard. Regards, Dave Boers. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: current lockups
It is rumoured that Marius Strom had the courage to say: Well, there was a discussion a few weeks back with Soren Schmidt and a few others. I believe the conclusion was made that this occurred with most WD drives (interesting about the WD == IBM part, I did notice he mentioned that in -current a few weeks ago as well). I had a WD20 gig that would just hang, and a number of other people had similar problems. (Theirs would log "Lost Disk Contact" in the dmesg as their root dev wasn't a UDMA66 drive) Interesting. I'll check my own archives of -current to see if I can find the discussion. I always thought that the "Lost Disk Contact" messages were due to the disk recalibrating itself after six days of continued use. After Soren increased the timeout from 5 to 10 seconds, I never saw the problem again, IIRC. For the record, (see my mail elsewhere in the thread) I have recently added an U2W SCSI harddisk to the system (because I found that the UDMA effectively cuts off memory access for the two celeron's for long times and because the celeron's haven't got nearly enough cache they are effectively waiting for the IDE disk all the time) and I'm now running my root filesystem on that drive (as well as most of my other important filesystems). So I guess that if your assertion is right then my problem should have gone away now. I haven't seen any "Lost Disk Contact" messages recently, however, though the UDMA66 drive is still connected. BTW, are there any people out there that have similar hangs and are NOT using UDMA66 or the ATA driver ? Unfortunately, the discussions occurred while the mailing list archive was kaput (WD Drive on UDMA66? =]) so it's not archived where I can find it. :-) Regards, Dave Boers. -- Dave Boers djb @ relativity . student . utwente . nl Don't let your schooling interfere with your education. (Mark Twain) To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: current lockups
Interesting. I'll check my own archives of -current to see if I can find the discussion. I always thought that the "Lost Disk Contact" messages were due to the disk recalibrating itself after six days of continued use. After Soren increased the timeout from 5 to 10 seconds, I never saw the problem again, IIRC. Six days? Nah.. I had the problem occur anywhere from 5 minutes to 12 hours after a system boot. Moved the 20G WD to UDMA33 channel, works flawlessly. Usually, I could reproduce the problem doing heavy disk I/O. However, I one time was able to make it through a "make buildworld", so that's not entirely true either. For the record, (see my mail elsewhere in the thread) I have recently added an U2W SCSI harddisk to the system (because I found that the UDMA effectively cuts off memory access for the two celeron's for long times and because the celeron's haven't got nearly enough cache they are effectively waiting for the IDE disk all the time) and I'm now running my root filesystem on that drive (as well as most of my other important filesystems). So I guess that if your assertion is right then my problem should have gone away now. I haven't seen any "Lost Disk Contact" messages recently, however, though the UDMA66 drive is still connected. For my record, I was unable to get dmesg output because the system was completely hung. Other people could get it because they had other drives to write logging information too when the UDMA drive was locked. --- Marius Strom [EMAIL PROTECTED] Professional Geek/Unix System Administrator Alpha1 Internet http://www.alpha1.net http://www.marius.org/marius.pgp 0x42C74CBA *UPDATED PGP KEY 2/24/2000* In theory, there is no difference between theory and practice... ...In practice, there is a big difference. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: current lockups
On 2000-Mar-06 21:39:11 +1100, Matthew Sean Thyer [EMAIL PROTECTED] wrote: My computer had been stable all winter (with setiathome runnning full time) but suddenly come the Australian summer it started freezing. And it's been the coldest summer for something like 5 years... How about these Peltier (sp ?) cooling devices I have heard about ? A Peltier cell is just a semiconductor heat pump. It effectively just reduces the junction-to-heatsink thermal resistance, allowing you (in theory) to use a less efficient heatsink (or have the CPU run cooler with the same heatsink. The downside is they they're relatively inefficient - your power supply will need to supply an extra 3-4A at 12v and you need to dissipate that extra power. Unless you significantly improve the airflow through the case, you'll probably find that the internal temperature rises significantly - further stressing everything except the CPU. Note that the chip that most needs cooling may not be the CPU - the big support chips can also run very hot. Peter To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: current lockups
On 2000-Mar-07 06:29:17 +1100, Dave Boers [EMAIL PROTECTED] wrote: It is rumoured that Arun Sharma had the courage to say: Compiling Mozilla with make -j 2 got -current to lock up, twice in succession. I'm running a fairly recent snapshot (a week or two old) on a Dual celeron box (BP6) with UDMA66 enabled. Finally. I've been complaining about this on several occasions. I'm also running UDMA66 and Dual Celeron BP6. No overclocking. Later postings mention possible problems with UDMA66. The other possibility that has been discussed recently is potential priority inversions for processes using rtptio and idprio. Note that ntpd will use rtprio if the Posix P1003.1b extensions aren't enabled in the kernel. (These were enabled by default in GENERIC on i386 in mid-January). If you have the new ntpd (rather than xntpd) and are running a kernel without options P1003_1B, _KPOSIX_PRIORITY_SCHEDULING and _KPOSIX_VERSION=199309L, you could potentially get a lockup due to a priority inversion. (Though I think the probability is very small). Peter To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: current lockups
In message [EMAIL PROTECTED], Peter Jeremy writes : How about these Peltier (sp ?) cooling devices I have heard about ? A Peltier cell is just a semiconductor heat pump. It effectively just reduces the junction-to-heatsink thermal resistance, allowing you (in theory) to use a less efficient heatsink (or have the CPU run cooler with the same heatsink. This is actually not true, quite the contrary in fact: You need a better heat-sink with a Peltier because of the significant electrical power you pump into it. As a general rule you can expect to *raise* your CPU temperature if you put a peltier under anything less than a *very good* heat-sink. Example: A Celeron 500 disipates about 25W An average heatsink is about .8 C/W delta-T becomes 25W * .8C/W = 20C At 30C ambient that becomes 50C CPU temperature. Now, add a peltier. To remove 25W and keep a 25C temperature difference we need to feed it about 50W Now the heatsink has to deal with 25 + 50 W and the delta-T becomes: (25W + 50W) * .8C/W = 60C Subtract the 25C difference from the peltier and add the ambient temperature and we find: 30C + 60C - 25C = 65C We just raised our CPU temperature about 15 C :-( -- Poul-Henning Kamp FreeBSD coreteam member [EMAIL PROTECTED] "Real hackers run -current on their laptop." FreeBSD -- It will take a long time before progress goes too far! To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: current lockups
It is rumoured that Peter Jeremy had the courage to say: Note that ntpd will use rtprio if the Posix P1003.1b extensions aren't enabled in the kernel. (These were enabled by default in GENERIC on i386 in mid-January). If you have the new ntpd (rather than xntpd) and are running a kernel without options P1003_1B, _KPOSIX_PRIORITY_SCHEDULING and _KPOSIX_VERSION=199309L, you could potentially get a lockup due to a priority inversion. (Though I think the probability is very small). I don't use ntpd (I use ntpdate) and I do have those options enabled in my kernel (all three of them). IIRC they are needed to get either cdrdao or cdrecord to work. Seems that everything points to UDMA66 so far... Regards, Dave Boers. -- Dave Boers djb @ relativity . student . utwente . nl Don't let your schooling interfere with your education. (Mark Twain) To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: current lockups
On Mon, Mar 06, 2000 at 11:59:21PM +0100, Dave Boers wrote: It is rumoured that Peter Jeremy had the courage to say: Note that ntpd will use rtprio if the Posix P1003.1b extensions aren't enabled in the kernel. (These were enabled by default in GENERIC on i386 in mid-January). If you have the new ntpd (rather than xntpd) and are running a kernel without options P1003_1B, _KPOSIX_PRIORITY_SCHEDULING and _KPOSIX_VERSION=199309L, you could potentially get a lockup due to a priority inversion. (Though I think the probability is very small). I don't use ntpd (I use ntpdate) and I do have those options enabled in my kernel (all three of them). IIRC they are needed to get either cdrdao or cdrecord to work. Seems that everything points to UDMA66 so far... ...maybe in certain combinations. I have a BP6 with dual celerons (466's @ 504) and have had no problems whatsoever. FreeBSD 4.0-CURRENT #4: Sun Mar 5 12:20:41 PST 2000 [EMAIL PROTECTED]:/usr/src/sys/compile/NORN Timecounter "i8254" frequency 1193182 Hz CPU: Pentium II/Pentium II Xeon/Celeron (503.92-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x665 Stepping = 5 Features=0x183fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CM OV,PAT,PSE36,MMX,FXSR real memory = 268369920 (262080K bytes) avail memory = 256987136 (250964K bytes) Programming 24 pins in IOAPIC #0 IOAPIC #0 intpin 2 - irq 0 FreeBSD/SMP: Multiprocessor motherboard cpu0 (BSP): apic id: 0, version: 0x00040011, at 0xfee0 cpu1 (AP): apic id: 1, version: 0x00040011, at 0xfee0 io0 (APIC): apic id: 2, version: 0x00170011, at 0xfec0 ad0: 9765MB FUJITSU MPC3102AT E [19841/16/63] at ata0-master using UDMA33 ad4: 12949MB IBM-DJNA-371350 [28064/15/63] at ata2-master using UDMA66 acd0: CDROM DELTA OPC-K101/ST1 F/W by OIPD at ata1-slave using PIO4 ad0 is a DOS drive, ad4 is what I have FreeBSD on. -Chris -- [EMAIL PROTECTED] [EMAIL PROTECTED] Abbotsford, BC, Canada To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: current lockups
On Mon, Mar 06, 2000 at 03:46:25PM -0600, Marius Strom wrote: Unfortunately, the discussions occurred while the mailing list archive was kaput (WD Drive on UDMA66? =]) so it's not archived where I can find it. I think this is the thread you're looking for: http://marc.theaimsgroup.com/?t=9503732951w=2r=1 Tim To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: current lockups
1. Is your computer overclocked? 2. Is the computer totally frozen? (i.e. scroll lock doesn't turn the light on) 3. Does similar load crash the box as well? (try make -j2 world) 4. Does it freeze in the same spot? 5. Is the computer not responding to pings? If you've answered yes to a good amount of these questions, there is a good chance that your processor(s) are overheating. Try improving the airflow to the case (But using a household fan isn't recommended due to EMI) -Dan Papasian [EMAIL PROTECTED] On Sat, Mar 04, 2000 at 11:50:10PM -0800, Arun Sharma wrote: Compiling Mozilla with make -j 2 got -current to lock up, twice in succession. I'm running a fairly recent snapshot (a week or two old) on a Dual celeron box (BP6) with UDMA66 enabled. The kernel had DDB enabled. I was running X, but I didn't see any signs of the kernel attempting to get into the debugger. Has this been fixed ? Is anyone interested in investigating ? I'll post more info if I find anything. -Arun To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message