Re: current lockups

2000-03-07 Thread Dave Boers

It is rumoured that Vallo Kallaste had the courage to say:
 I had a lockup yesterday while stress-testing new SMP machine. Tyan
 motherboard with Intel GX chipset, 256MB of memory, one 20GB IBM UDMA66
 disk, but running at UDMA33. All power management disabled completely in
 the BIOS. I was doing massive parallel compiling of GENERIC kernels.
 Let the machine doing this overnight and on the morning the console had
 about 20 'microuptime() went backwards' messages, I was able to switch
 vty's but not login, machine responded to pings, no disk activity. I'm
 using ata driver and only one unusual kernel option HZ=1000.

Your symptoms are not the same as mine. In my case the lockups are
complete. No switching of vt's, no pings, nothing at all. 

I never saw any "microuptime() went backwards" messages either. But then
again, I never had the machine lockup on the console; I was usually logged
in over the network or working in X. 

Regards, 

Dave Boers. 

-- 
  Dave Boers  djb @ relativity . student . utwente . nl 
  Don't let your schooling interfere with your education. (Mark Twain)


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: current lockups

2000-03-06 Thread Dave Boers

It is rumoured that Arun Sharma had the courage to say:
 Compiling Mozilla with make -j 2 got -current to lock up, twice in
 succession. I'm running a fairly recent snapshot (a week or two old)
 on a Dual celeron box (BP6) with UDMA66 enabled.

Finally. I've been complaining about this on several occasions. I'm also
running UDMA66 and Dual Celeron BP6. No overclocking. 
 
 The kernel had DDB enabled. I was running X, but I didn't see any
 signs of the kernel attempting to get into the debugger.

Ditto here. 
 
 Has this been fixed ? Is anyone interested in investigating ?
 I'll post more info if I find anything.

I'm interested in the fix, of course :-) But where to start looking? I've
had three lockups so far (none before january 2000) but I didn't find
anything that reliably triggered it. 

Regards, 

Dave. 

-- 
  Dave Boers  djb @ relativity . student . utwente . nl 
  Don't let your schooling interfere with your education. (Mark Twain)


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Small bug in chown and chgrp ?

2000-03-06 Thread Dave Boers

Hi all, 

I've been bitten by the following: 

44 relativity ~ % chown -v djb:wheel test
chown: illegal option -- v
usage: chown [-R [-H | -L | -P]] [-f] [-h] [-v] owner[:group] file ...
   chown [-R [-H | -L | -P]] [-f] [-h] [-v] :group file ...
   chgrp [-R [-H | -L | -P]] [-f] [-h] [-v] group file ...

Where "test" is an ordinary directory. 

It seems that chown's behavior is inconsistent with both the usage message
and the man page. The same goes for chgrp. 

Regards, 

Dave Boers. 

-- 
  Dave Boers  djb @ relativity . student . utwente . nl 
  Don't let your schooling interfere with your education. (Mark Twain)


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: Small bug in chown and chgrp ?

2000-03-06 Thread Dave Boers

It is rumoured that Alfred Perlstein had the courage to say:
 have you deleted your stale copies of chown/chgrp?  hint look in 
 /bin /sbin /usr/bin /usr/sbin and make sure the old ones aren't
 "in the way".

Yes I have. Because I got a new disk, I did a fresh install of -current a
few weeks ago. Well after the change of chown/chgrp from /bin and /sbin to
/usr/bin and /usr/sbin. I double checked to make sure and the only versions
of chown/chgrp are the ones in /usr/sbin and /usr/bin respectively. 

Regards, 

Dave Boers. 

-- 
  Dave Boers  djb @ relativity . student . utwente . nl 
  Don't let your schooling interfere with your education. (Mark Twain)


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: current lockups

2000-03-06 Thread Dave Boers

It is rumoured that Arun Sharma had the courage to say:
 The cooling theory sounds the most plausible so far. I'm not over clocking
 my CPUs (Celeron 366s) and have appropriate cooling installed. But the
 machine is kept in a small room, with a bunch of other machines and gets
 a bit warm at times.

My system has been 50 degrees Celcius for the past half year or so. Yet,
the lockups only started occurring around January 2000. Once again, my
system is not overclocked and the temperature is well within Intel's and
Abit's temperature specifications, so there shouldn't be hardware problems. 

 There has been no reproducible case of locking up. Each one looks different.
 But most were trigerred by heavy compilation and I/O. One was a lockup
 overnight with no activity on the system. When it happens, it does not
 respond to pings or scroll lock.

Most of my lockups occurred when the system was relatively idle. Mostly
they happened only after 9 - 11 days of uptime. As you say, each one looks
different and there doesn't seem to be a pattern to it. When it locks up,
there is no response to the console, the network or the serial terminal.
Only the reset button is obeyed. I have DDB in my kernel, but there's no
getting into it. Also, no log messages of any kind from just before the
lockups.  

 If you'd like to do something about it, working on getting a reproducible
 hang would be the most beneficial one.

That's what I have been trying to do for the past few weeks, but I can't
seem to trigger it. Uptime is now 2 days and I intend to let it run to 12
or so before make installworld again, to see if I can reproduce it.
However, I did recently change from UDMA66 to an U2W SCSI disk for my main
partitions (/, /usr, /var, /tmp and swap). It may have impact on the
situation and it is the reason for the short uptime. If the problem has
gone away now, it might indicate something with the ATA driver. I'll keep
you informed. So far, since the disk change I've been putting my system
under some heavy load from time to time (like building three large ports
and make -j 12 buildworld at the same time). So far, the system is quite
stable. 

Regards, 

Dave Boers. 

-- 
  Dave Boers  djb @ relativity . student . utwente . nl 
  Don't let your schooling interfere with your education. (Mark Twain)


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: current lockups

2000-03-06 Thread Dave Boers

It is rumoured that Marius Strom had the courage to say:
 I'm willing to bet a nickel (perhaps more) you people are running non-IBM
 UDMA66 drives on that BP6.  Seems that most UDMA66 drives are not actually
 UDMA66 compliant,  and they only drives that have been reported successful
 on the BP6 are IBM.  Try taking your HD's off the UDMA66 controller and
 put them on the Standard UDMA33 controllers, and it should clear things
 up.

I'm interested in the sources of your statement about IBM drivers vs. non
IBM drives. 

In my case, I have a WD 18.2 Gb 7200 rpm disk which has been reported to be
identical to the IBM 18.2 Gb 7200 rpm disk on more than one occasion. And
by the way, my system has been running quite stable before January 2000
with the same disk on the same controller and the same mainboard. 

Regards, 

Dave Boers. 

-- 
  Dave Boers  djb @ relativity . student . utwente . nl 
  Don't let your schooling interfere with your education. (Mark Twain)


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: current lockups

2000-03-06 Thread Dave Boers

It is rumoured that Marius Strom had the courage to say:
 Well, there was a discussion a few weeks back with Soren Schmidt and a few
 others.  I believe the conclusion was made that this occurred with most WD
 drives (interesting about the WD == IBM part, I did notice he mentioned
 that in -current a few weeks ago as well).  I had a WD20 gig that would
 just hang, and a number of other people had similar problems. (Theirs
 would log "Lost Disk Contact" in the dmesg as their root dev wasn't a
 UDMA66 drive)

Interesting. I'll check my own archives of -current to see if I can find
the discussion. I always thought that the "Lost Disk Contact" messages were
due to the disk recalibrating itself after six days of continued use. After
Soren increased the timeout from 5 to 10 seconds, I never saw the problem
again, IIRC. 

For the record, (see my mail elsewhere in the thread) I have recently added
an U2W SCSI harddisk to the system (because I found that the UDMA
effectively cuts off memory access for the two celeron's for long times and
because the celeron's haven't got nearly enough cache they are effectively
waiting for the IDE disk all the time) and I'm now running my root
filesystem on that drive (as well as most of my other important
filesystems). So I guess that if your assertion is right then my problem
should have gone away now.  I haven't seen any "Lost Disk Contact" messages
recently, however, though the UDMA66 drive is still connected. 

BTW, are there any people out there that have similar hangs and are NOT
using UDMA66 or the ATA driver ? 

 Unfortunately, the discussions occurred while the mailing list archive was
 kaput (WD Drive on UDMA66? =]) so it's not archived where I can find it.

:-)
 
Regards, 

Dave Boers. 

-- 
  Dave Boers  djb @ relativity . student . utwente . nl 
  Don't let your schooling interfere with your education. (Mark Twain)


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: current lockups

2000-03-06 Thread Dave Boers

It is rumoured that Peter Jeremy had the courage to say:
 Note that ntpd will use rtprio if the Posix P1003.1b extensions aren't
 enabled in the kernel.  (These were enabled by default in GENERIC on
 i386 in mid-January).  If you have the new ntpd (rather than xntpd)
 and are running a kernel without options P1003_1B,
 _KPOSIX_PRIORITY_SCHEDULING and _KPOSIX_VERSION=199309L, you could
 potentially get a lockup due to a priority inversion.  (Though I
 think the probability is very small).

I don't use ntpd (I use ntpdate) and I do have those options enabled in my
kernel (all three of them). IIRC they are needed to get either cdrdao or
cdrecord to work. 

Seems that everything points to UDMA66 so far...

Regards, 

Dave Boers. 

-- 
  Dave Boers  djb @ relativity . student . utwente . nl 
  Don't let your schooling interfere with your education. (Mark Twain)


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: current.freebsd.org (FTP)

2000-02-29 Thread Dave Boers

It is rumoured that Forrest Aldrich had the courage to say:
 No, it allows you to log in, but will not accept anonymous logins.
 Login Incorrect

This has been going on for nearly 20 hours now. About 20 hours ago the
machine was briefly unreachable and when it came online again it refused
logins. 

Regards, 

Dave. 

-- 
  Dave Boers  djb @ relativity . student . utwente . nl 
  Don't let your schooling interfere with your education. (Mark Twain)


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message