Re: Seeing system-lockups on recent current
At 11:52 PM +0200 10/10/03, Dag-Erling Smørgrav wrote: Doug White <[EMAIL PROTECTED]> writes: On Fri, 10 Oct 2003, Garance A Drosihn wrote: > > For the past week or so, I have been having a frustrating > > time with my freebsd-current/i386 system. It is a dual > > Athlon system. [...] > It would be useful to isolate exactly what day the problem > started occuring. I experienced similar problems on a dual Athlon system (MSI K7D Master-L motherboard, AMD 760MPX chipset, dual Athlon MP 2200+) which is barely a couple of months old. I ended up reverting to RELENG_5_1. With -CURRENT, both UP and SMP kernels will crash with symptoms which suggest hardware trouble. With RELENG_5_1, UP is rock solid (knock on wood) while SMP crashes within minutes of booting. Just to follow up on this... My symptoms were different, in that I have problems with both UP and SMP (although UP did seem more stable). I also tried a clean install of 5.1-RELEASE (right off the CD's), and that would also hang up. Since I *know* this machine had been running fine back at the time of 5.1-release, this was pretty significant. I took the PC back to the place I got it from, and they ran some kind of diagnostics on it and said the motherboard is bad. They're replacing the motherboard. So, unless I have something more to say when I get that back, it looks pretty likely that my headaches were hardware-related. (my machine also has a different components than des's machine) -- Garance Alistair Drosehn= [EMAIL PROTECTED] Senior Systems Programmer or [EMAIL PROTECTED] Rensselaer Polytechnic Instituteor [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Seeing system-lockups on recent current
I'm seeing similar lockups, however they started shortly after the new ATA code was committed. The lockups usually occur when there's a lot of ATA activity, e.g. filesystem or fsck. At the moment I can only guess as to what the problem might be (missing interrupt is my most educated guesss) but keeping the amount of ATA I/O to a minimum does help the situation. Both machines which have suffered the problem have intel chipsets. One is a 12 year old P120 (I cannot recall the exact chipset) and the other is a PIII with an 815E chipset. On a couple of occasions I had systat running and noticed that buffers in use climbed until the system just froze, responding only to pings. In all cases all filesystems were generally "clean" just with the dirty bit set, except for filesystem on an ATA drive (/var or /export) which required considerable cleanup. Filesystems that reside on SCSI devices have yet to exhibit any symptoms, e.g. requiring anything more than resetting the dirty bit. Due to this problem I've yet to complete a portupgrade, something I've been trying to complete over the last four weeks, as it usually hangs the system within 12 hours. Cheers, -- Cy Schubert <[EMAIL PROTECTED]>http://www.komquats.com/ BC Government . FreeBSD UNIX [EMAIL PROTECTED] . [EMAIL PROTECTED] http://www.gov.bc.ca/ .http://www.FreeBSD.org/ In message <[EMAIL PROTECTED]>, Garance A Drosihn writes: > For the past week or so, I have been having a frustrating time > with my freebsd-current/i386 system. It is a dual Athlon > system. It has been running -current just fine since December, > with me updating the OS every week or two. I did not update it > for most of September, and then went to update it to pick up > the recent round of security-related fixes. > > My first update run picked up a change which caused system > panics. Other people were also seeing that panic, and it > wasn't long before updates were committed to current to fix > that problem. However, ever since then my -current system > has very frequently locked up. Totally locked. The only way > to get it back is a hardware reset. > > I have rebuilt the system at least a dozen times since then. > I have built it with snapshots of /usr/src from Sept 12th > to Oct 8th (which is what it's running at the moment). I > have dropped back to a single-CPU kernel. I turned off X > (in /etc/ttys) so that doesn't start up at all. All those > attempts to get a reliable 5.x-system have not worked. > Sometimes the system will crash in the middle of a buildworld, > other times it will crash while it's basically idle and the > monitor is turned off. One time it crashed in the middle of > an installworld -- right when it was replacing /lib files. > Boy was that a headache to recover from! > > On the same PC, in a different DOS partition, is a 4.x-stable > system. If I boot into 4.x, I have no problems. I fire up > all the servers that I run, start buildworlds, run cvsup's, > and even had all the 5.x partitions mounted and was running > a infinite-loop that MD5'd every file in the 5.x system. I > had all of that going on at the same time, and the system is > fine. While in the 4.x system, I've removed /usr/src on the > 5.x system and recreated it, just in case there were some > files corrupted in there. And once the problems started, I > made a point of always removing all of /usr/obj/usr/src > before starting the buildworld, in case there were corrupted > files in there. > > I still have a few things I want to try. And I know it could > still be a hardware problem (although it bugs me that it fails > so consistently on 5.x and never fails on 4.x). Perhaps it > is just some disk-corruption problem that occurred during the > first few panics. But I thought I'd at least mention it, and > see if anyone else has been having similar problems. > > -- > Garance Alistair Drosehn= [EMAIL PROTECTED] > Senior Systems Programmer or [EMAIL PROTECTED] > Rensselaer Polytechnic Instituteor [EMAIL PROTECTED] > ___ > [EMAIL PROTECTED] mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "[EMAIL PROTECTED]" > ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Seeing system-lockups on recent current
On Fri, 10 Oct 2003, Dag-Erling Smørgrav wrote: > I experienced similar problems on a dual Athlon system (MSI K7D > Master-L motherboard, AMD 760MPX chipset, dual Athlon MP 2200+) which > is barely a couple of months old. I ended up reverting to RELENG_5_1. Same here. MSI K7D Master-L motherboard, with -CURRENT and MP kernel there is no way to make buildworld without panic. Even buildkernel exits with random signals. With MP 4-STABLE, MP Dragonfly and WinXP the same machine is rock-stable. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Seeing system-lockups on recent current
Don Lewis <[EMAIL PROTECTED]> writes: > My Athlon XP 1900+/AMD 761 UP box is happily running a late October 6th > version of -current. XP != MP DES -- Dag-Erling Smørgrav - [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Seeing system-lockups on recent current
On 10 Oct, Dag-Erling Smørgrav wrote: > Doug White <[EMAIL PROTECTED]> writes: >> On Fri, 10 Oct 2003, Garance A Drosihn wrote: >> > For the past week or so, I have been having a frustrating time >> > with my freebsd-current/i386 system. It is a dual Athlon >> > system. [...] >> It would be useful to isolate exactly what day the problem started >> occuring. > > I experienced similar problems on a dual Athlon system (MSI K7D > Master-L motherboard, AMD 760MPX chipset, dual Athlon MP 2200+) which > is barely a couple of months old. I ended up reverting to RELENG_5_1. > With -CURRENT, both UP and SMP kernels will crash with symptoms which > suggest hardware trouble. With RELENG_5_1, UP is rock solid (knock on > wood) while SMP crashes within minutes of booting. I've run out of > patience with this system, so I'll keep running RELENG_5_1 on it until > someone manages to convince me that -CURRENT will run properly on AMD > hardware (maybe around 5.3 or so...) My Athlon XP 1900+/AMD 761 UP box is happily running a late October 6th version of -current. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Seeing system-lockups on recent current
Doug White <[EMAIL PROTECTED]> writes: > On Fri, 10 Oct 2003, Garance A Drosihn wrote: > > For the past week or so, I have been having a frustrating time > > with my freebsd-current/i386 system. It is a dual Athlon > > system. [...] > It would be useful to isolate exactly what day the problem started > occuring. I experienced similar problems on a dual Athlon system (MSI K7D Master-L motherboard, AMD 760MPX chipset, dual Athlon MP 2200+) which is barely a couple of months old. I ended up reverting to RELENG_5_1. With -CURRENT, both UP and SMP kernels will crash with symptoms which suggest hardware trouble. With RELENG_5_1, UP is rock solid (knock on wood) while SMP crashes within minutes of booting. I've run out of patience with this system, so I'll keep running RELENG_5_1 on it until someone manages to convince me that -CURRENT will run properly on AMD hardware (maybe around 5.3 or so...) Now, my shiny new 2.4 GHz P4, on the other hand... *drool* DES -- Dag-Erling Smørgrav - [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Seeing system-lockups on recent current
At 12:48 PM -0700 10/10/03, Doug White wrote: On Fri, 10 Oct 2003, Garance A Drosihn wrote: For the past week or so, I have been having a frustrating time with my freebsd-current/i386 system. It is a dual Athlon system. It has been running -current just fine since December, with me updating the OS every week or two. I did not update it for most of September, and then went to update it to pick up the recent round of security-related fixes. It would be useful to isolate exactly what day the problem started occuring. That would simplify isolating the offending commit. Use the date specifier in cvsup to checkout specific dates, then build & test. I've done that. As mentioned in the message, I've done complete system rebuilds using snapshots from about Sept 12th to Oct 8th. The problem is that it's tedious do keep doing these rebuilds, when the very act of a buildworld or buildkernel can trigger the system lockup. I really am torn between thinking that it's a change in -current and thinking it must be something about my specific system. Depending on which set of observations I pick, I can make an excellent case for either one being the culprit. So, if no one else *is* seeing this kind of problem, then it's more likely to be my hardware (one way or another). I'll keep trying things. -- Garance Alistair Drosehn= [EMAIL PROTECTED] Senior Systems Programmer or [EMAIL PROTECTED] Rensselaer Polytechnic Instituteor [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Seeing system-lockups on recent current
On Fri, 10 Oct 2003, Garance A Drosihn wrote: > For the past week or so, I have been having a frustrating time > with my freebsd-current/i386 system. It is a dual Athlon > system. It has been running -current just fine since December, > with me updating the OS every week or two. I did not update it > for most of September, and then went to update it to pick up > the recent round of security-related fixes. It would be useful to isolate exactly what day the problem started occuring. That would simplify isolating the offending commit. Use the date specifier in cvsup to checkout specific dates, then build & test. -- Doug White| FreeBSD: The Power to Serve [EMAIL PROTECTED] | www.FreeBSD.org ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Seeing system-lockups on recent current
For the past week or so, I have been having a frustrating time with my freebsd-current/i386 system. It is a dual Athlon system. It has been running -current just fine since December, with me updating the OS every week or two. I did not update it for most of September, and then went to update it to pick up the recent round of security-related fixes. My first update run picked up a change which caused system panics. Other people were also seeing that panic, and it wasn't long before updates were committed to current to fix that problem. However, ever since then my -current system has very frequently locked up. Totally locked. The only way to get it back is a hardware reset. I have rebuilt the system at least a dozen times since then. I have built it with snapshots of /usr/src from Sept 12th to Oct 8th (which is what it's running at the moment). I have dropped back to a single-CPU kernel. I turned off X (in /etc/ttys) so that doesn't start up at all. All those attempts to get a reliable 5.x-system have not worked. Sometimes the system will crash in the middle of a buildworld, other times it will crash while it's basically idle and the monitor is turned off. One time it crashed in the middle of an installworld -- right when it was replacing /lib files. Boy was that a headache to recover from! On the same PC, in a different DOS partition, is a 4.x-stable system. If I boot into 4.x, I have no problems. I fire up all the servers that I run, start buildworlds, run cvsup's, and even had all the 5.x partitions mounted and was running a infinite-loop that MD5'd every file in the 5.x system. I had all of that going on at the same time, and the system is fine. While in the 4.x system, I've removed /usr/src on the 5.x system and recreated it, just in case there were some files corrupted in there. And once the problems started, I made a point of always removing all of /usr/obj/usr/src before starting the buildworld, in case there were corrupted files in there. I still have a few things I want to try. And I know it could still be a hardware problem (although it bugs me that it fails so consistently on 5.x and never fails on 4.x). Perhaps it is just some disk-corruption problem that occurred during the first few panics. But I thought I'd at least mention it, and see if anyone else has been having similar problems. -- Garance Alistair Drosehn= [EMAIL PROTECTED] Senior Systems Programmer or [EMAIL PROTECTED] Rensselaer Polytechnic Instituteor [EMAIL PROTECTED] ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"