vm86 problem in -CURRENT

2001-07-18 Thread Kazutaka YOKOTA

I am trying to track down a vm86 problem in -CURRENT.  The trouble is
that this is an intermittent problem and is not always reproducible.

The last one I managed to capture was:

==
VESA: set_mode(): 24(18) - 259(103)
VESA: about to set a VESA mode...


Fatal trap 12: page fault while in vm86 mode
cpuid = 1; lapic.id = 0100
fault virtual address   = 0xc638b
fault code  = user read, page not present
instruction pointer = 0xc000:0x638b
stack pointer   = 0x0:0xf9c
frame pointer   = 0x0:0xfdc
code segment= base 0x2, limit 0x504eb, type 0x0
= DPL 0, pres 0, def32 0, gran 0
processor eflags= interrupt enabled, resume, vm86, IOPL = 0
current process = 444 (logo_saver)
kernel: type 12 trap, code=0
Stopped at  0x638b:

Fatal trap 12: page fault while in kernel mode
cpuid = 1; lapic.id = 0100
fault virtual address   = 0x638b
fault code  = supervisor read, page not present
instruction pointer = 0x8:0xc02c0ad0
stack pointer   = 0x10:0xc91a8e44
frame pointer   = 0x10:0xc91a8e48
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 444 (logo_saver)
 kernel: type 12 trap, code=0
db trace


Fatal trap 12: page fault while in kernel mode
cpuid = 1; lapic.id = 0100
fault virtual address   = 0x638b
fault code  = supervisor read, page not present
instruction pointer = 0x8:0xc02c0ad0
stack pointer   = 0x10:0xc91a8d68
frame pointer   = 0x10:0xc91a8d6c
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 444 (logo_saver)
kernel: type 12 trap, code=0
db
==

This was generated when I ran a small test program which tried to
switch to the VESA 800x600 256 color mode via syscons ioctl.  This
ioctl leads to vm86_intcall() to call the VESA video BIOS.  (I had a
report from a -CURRENT user that setting a VESA mode by `vidcontrol
VESA_800x600' bombed his system too.)  My test program and vidcontrol
has no trouble in RELENG_4 (and RELENG_3).

The trap was generated inside the VESA video BIOS ROM. The instruction
in question was:
repe stosw es(edi)
where es and edi were set to 0xa000 and 0x0 respectively immediately
before the above instruction.  (The video BIOS was trying to clear the
video ram.)

As far as I understand, the entire 1M bytes of lower physical memory
is supposed to be mapped when vm86_intcall() is run.  Apparently
0xc, where the video BIOS ROM resides, is mapped OK.  But, somehow
0xa, where the video ram is, went missing.  As I wrote before,
this test program sometimes runs fine, sometimes does not.

Where in the kernel should I investigate further?

Kazu


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: libedit replacement for libreadline

2001-07-18 Thread Terry Lambert

Andrey A. Chernov wrote:
   Okay.  So it sounds like there's a shim to libedit which would be
   the API replacement for libreadline.  Could we call that something
   cute like 'libreadlinele' ('le' for 'libedit') or 'libeditrl', but
   leave libreadline as a separate port?
 
  How about 'libedit'? :) I could live with that; it's just some
  makefile changes.
 
 I vote this too. We don't need stripped down libreadline under
 'libreadline' name pretend to be full version (f.e. for autoconf, etc.)

The cryptography libraries have set a precedent here.  I
could argue the same thing about the presence of a full DES
in libcrypt.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: libedit replacement for libreadline

2001-07-18 Thread Maxim Sobolev

Terry Lambert wrote:

 Andrey A. Chernov wrote:
Okay.  So it sounds like there's a shim to libedit which would be
the API replacement for libreadline.  Could we call that something
cute like 'libreadlinele' ('le' for 'libedit') or 'libeditrl', but
leave libreadline as a separate port?
  
   How about 'libedit'? :) I could live with that; it's just some
   makefile changes.
 
  I vote this too. We don't need stripped down libreadline under
  'libreadline' name pretend to be full version (f.e. for autoconf, etc.)

 The cryptography libraries have set a precedent here.  I
 could argue the same thing about the presence of a full DES
 in libcrypt.

I failed to understand what you are trying to say. Do you mean that we have
to follow a bad practice set by that precedent at any costs, or I parsed your
message incorrectly?

-Maxim


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: libedit replacement for libreadline

2001-07-18 Thread Terry Lambert

Maxim Sobolev wrote:
   I vote this too. We don't need stripped down libreadline under
   'libreadline' name pretend to be full version (f.e. for autoconf, etc.)
 
  The cryptography libraries have set a precedent here.  I
  could argue the same thing about the presence of a full DES
  in libcrypt.
 
 I failed to understand what you are trying to say. Do you mean that we have
 to follow a bad practice set by that precedent at any costs, or I parsed your
 message incorrectly?

I'm saying fix it both places, or it obviously is not a
sufficient justification for a decision.

Or to put it another way if you are willing to live with
it in one place, why not two?.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



installworld failures in calendar

2001-07-18 Thread Brooks Davis

I've been seeing the following failure in my installworlds for the past
week or so.  I've been getting around it with make -k, but it's kinda
annoying.  Just now I took a look at etc/mtree/BSD.user.dist and noticed
that the directory we're actualy creating is de_DE.ISO_8859-1 not
de_DE.ISO8859-1.  Which is correct?

=== usr.bin/calendar
install -c -o root -g wheel -m 444
/usr/src/usr.bin/calendar/calendars/calendar.* /usr/share/calendar
install -c -o root -g wheel -m 444
/usr/src/usr.bin/calendar/calendars/de_DE.ISO8859-1/calendar.*
/usr/share/calendar/de_DE.ISO8859-1;
usage: install [-bCcpSsv] [-B suffix] [-f flags] [-g group] [-m mode]
   [-o owner] file1 file2
   install [-bCcpSsv] [-B suffix] [-f flags] [-g group] [-m mode]
   [-o owner] file1 ... fileN directory
   install -d [-v] [-g group] [-m mode] [-o owner] directory ...
*** Error code 64

Stop in /usr/src/usr.bin/calendar.
*** Error code 1

-- Brooks

-- 
Any statement of the form X is the one, true Y is FALSE.
PGP fingerprint 655D 519C 26A7 82E7 2529  9BF0 5D8E 8BE9 F238 1AD4

 PGP signature


Re: installworld failures in calendar

2001-07-18 Thread Brooks Davis

On Wed, Jul 18, 2001 at 10:18:06AM -0700, Brooks Davis wrote:
 I've been seeing the following failure in my installworlds for the past
 week or so.  I've been getting around it with make -k, but it's kinda
 annoying.  Just now I took a look at etc/mtree/BSD.user.dist and noticed
 that the directory we're actualy creating is de_DE.ISO_8859-1 not
 de_DE.ISO8859-1.  Which is correct?

Ok, now I feel like an idiot.  The failure is as described, but the
mtree bit is wrong.  I was looking at a STABLE system by accident there.

-- Brooks

 PGP signature


Re: vm86 problem in -CURRENT

2001-07-18 Thread Jonathan Lemon

In article 
local.mail.freebsd-current/[EMAIL PROTECTED] you 
write:
As far as I understand, the entire 1M bytes of lower physical memory
is supposed to be mapped when vm86_intcall() is run.  Apparently
0xc, where the video BIOS ROM resides, is mapped OK.  But, somehow
0xa, where the video ram is, went missing.  As I wrote before,
this test program sometimes runs fine, sometimes does not.


When you make a vm86 call from the kernel, it uses a private page
table (located at vm86paddr) in order to map the pages.  The details
of the layout can be found in i386/i386/vm86.c.

This page table is initially populated in locore.s, and contains 
only page 0 + the ISA hole (0xa - 0x10).  If getmemsize()
detects that there is a hole between basemem and ISA memory, the pages
in this hole will additionally be mapped into the vm86 space.

If you're getting a page fault while trying to access 0xa, then
it would seem that the entries in the vm86 page table are incorrect.
You can check this by examining the page table located at vm86paddr.
-- 
Jonathan

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: libedit replacement for libreadline

2001-07-18 Thread Max Khon

hi, there!

On Tue, 17 Jul 2001, Garance A Drosihn wrote:

 Personally, I think it's worth it to get rid of a GNU dependency
 in the base system, as well as reducing the overall amount of
 functional code duplication.
 
 I may be misunderstanding what you mean here, but I don't think
 we should replace libreadline with libedit.  However, I do find
 this very interesting, as some of my friends and I have a program
 that we're going to switch from gnu to bsd licensing, and it
 would be nice if we could use this libedit instead of libreadline.

I read on pgsql-hackers mailinglist that only static linking with
libreadline will infect your binaries with GPL virus. 
btw PostgreSQL already has support for NetBSD's libedit

/fjoe


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



netpbm broken on current crtn.o new binutils ?

2001-07-18 Thread Manfred Antar

The port netpbm builds and installs fine on current.
When trying to build the docs in /usr/docs the program peps calls 
/usr/local/bin/pnmtopng
This is from the netpbm port.

(/usr/local/bin)505}./pnmtopng 
/usr/libexec/ld-elf.so.1: /usr/lib/crtn.o: unsupported file type
 
(/usr/local/bin)506} ldd ./pnmtopng
./pnmtopng:
libpnm.so.1 = /usr/local/lib/libpnm.so.1 (0x2806c000)
libppm.so.1 = /usr/local/lib/libppm.so.1 (0x28073000)
libpgm.so.1 = /usr/local/lib/libpgm.so.1 (0x2807c000)
libpbm.so.1 = /usr/local/lib/libpbm.so.1 (0x2807f000)
libpng.so.4 = /usr/local/lib/libpng.so.4 (0x2808d000)
libm.so.2 = /usr/lib/libm.so.2 (0x280ad000)
libc.so.5 = /usr/lib/libc.so.5 (0x280c9000)
=crtn.o (0x0)
libz.so.2 = /usr/lib/libz.so.2 (0x2817d000)
Manfred




==
||  [EMAIL PROTECTED]   ||
||  Ph. (415) 681-6235  ||
==


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: more on supermicro 6010H hang

2001-07-18 Thread John Polstra

In article [EMAIL PROTECTED],
Dave Cornejo  [EMAIL PROTECTED] wrote:
 I have isolated the point at which current no longer runs as Jan 31 -
 Feb 1 of this year.  Prior version work fine, in Feb  Mar I get
 either Kernel trap 9 with interrupts disabled or I think the same
 thing with trap 26 (really not sure on that one).
 
 Next I took a brand new current from this evening and tried it - it
 still hangs, but a keypress on the keyboard pretty much always breaks
 it out of the hang and into a normal boot.
 
 Now, I finally got the equipment and time together to remote gdb the
 bad kernel and here's what I get:
 
 I set a breakpoint at cam_xpt.c::xpt_config() - this is where the
 Waiting 15 seconds.. message is from and stepped down through it.  I
 get through the first xpt_for_all_busses (xptconfigbuscountfunc,...)
 and then I hit the second one (~line 6749 of cam_xpt.c) I pass through
 several things, including the xptconfigfunc() and end up in
 subr_autoconf.c::run_interrupt_driven_config_hooks().  At the bottom
 of this function there is a tsleep that gets called - this is
 apparently where it hangs.  If I hit a key on the keyboard it will
 continue on past this point and all seems to work fine from then on.

This is probably not it, but it's worth a peek.  Check your BIOS
settings and see if there's one that controls whether the USB
interrupt is enabled.  Make sure that this interrupt is enabled.  If
it's not, I know you can get hangs at exactly the point where the
Waiting 15 seconds.. message comes out.

John
-- 
  John Polstra   [EMAIL PROTECTED]
  John D. Polstra  Co., Inc.Seattle, Washington USA
  Disappointment is a good sign of basic intelligence.  -- Chögyam Trungpa


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



leave patch

2001-07-18 Thread David Hill

Hello -
This leave patche gets rid of white space before the input, and after the +, 
if there is one.

I also moved the #define's to the top of the source file, and change 1 to 
STDOUT_FILENO.

The patch is included with this email, and is available online at
http://www.phobia.ms/patches/leave.c.18072001.patch

Thanks
- David
 leave.c.18072001.patch


fatal flaw in diskcheckd...

2001-07-18 Thread Matthew Jacob


So, I took a SCSi disk away. Diskcheckd started complaining. However,
a camcontrol rescan couldn't make the disk go away until I killed off
diskcheckd, which then closed the disk, allowing the rescan to remove it.
Bad. Bad. Bad.

ev/da4
Jul 18 14:31:15 diskcheckd[202]: error reading 512 bytes from sector 908502 on
/dev/da4
Jul 18 14:31:15 diskcheckd[202]: error reading 512 bytes from sector 908505 on
/dev/da4
Jul 18 14:31:15 diskcheckd[202]: error reading 512 bytes from sector 908506 on
/dev/da4
Jul 18 14:31:15 diskcheckd[202]: error reading 512 bytes from sector 908513 on
/dev/da4
Jul 18 14:31:51 diskcheckd[202]: error reading 512 bytes from sector 908629 on
/dev/da4
Jul 18 14:31:51 diskcheckd[202]: error reading 512 bytes from sector 908636 on
/dev/da4
(da4:isp3:0:5:0): Synchronize cache failed, status == 0x4a, scsi status == 0x0
(da4:isp3:0:5:0): removing device entry



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: fatal flaw in diskcheckd...

2001-07-18 Thread Alfred Perlstein

* Matthew Jacob [EMAIL PROTECTED] [010718 16:33] wrote:
 
 So, I took a SCSi disk away. Diskcheckd started complaining. However,
 a camcontrol rescan couldn't make the disk go away until I killed off
 diskcheckd, which then closed the disk, allowing the rescan to remove it.
 Bad. Bad. Bad.
 
 ev/da4
 Jul 18 14:31:15 diskcheckd[202]: error reading 512 bytes from sector 908502 on
 /dev/da4
 Jul 18 14:31:15 diskcheckd[202]: error reading 512 bytes from sector 908505 on
 /dev/da4
 Jul 18 14:31:15 diskcheckd[202]: error reading 512 bytes from sector 908506 on
 /dev/da4
 Jul 18 14:31:15 diskcheckd[202]: error reading 512 bytes from sector 908513 on
 /dev/da4
 Jul 18 14:31:51 diskcheckd[202]: error reading 512 bytes from sector 908629 on
 /dev/da4
 Jul 18 14:31:51 diskcheckd[202]: error reading 512 bytes from sector 908636 on
 /dev/da4
 (da4:isp3:0:5:0): Synchronize cache failed, status == 0x4a, scsi status == 0x0
 (da4:isp3:0:5:0): removing device entry

Is diskcheckd still on by default?  If so, can whomever enabled it
turn it off?  If not I'll be 'fixing' this oversight this afternoon.

thanks,
-- 
-Alfred Perlstein [[EMAIL PROTECTED]]
Ok, who wrote this damn function called '??'?
And why do my programs keep crashing in it?

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Large patchset for the floppy driver

2001-07-18 Thread Joerg Wunsch

Hello all,

before leaving for vacation next week, i thought i'd put up the result
of my work of the last couple of weeks for a review for those who are
interested.

The patch itself is available at

http://people.freebsd.org/~joerg/fdpatch.txt.gz

The following README file (see below) is also available at

http://people.freebsd.org/~joerg/README.floppy

Suggestions are welcome, but keep in mind that i won't have the
opportunity to reply within ca. the next three weeks.

==
Many changes and improvements to the floppy driver.

First of all, something i've meant to implement for years now, but
never came around: automatic format (density) selection for the major
medium formats.  That is, you can stuff your 720 KB medium into your
1.44 MB drive, and simply access /dev/fd0.  No questions asked, it
will just handle it. :)  (If you ever wondered why this didn't happen
before, the answer is very simple: i needed the infrastructure to read
a sector ID field before.  That was one of the recent additions to the
driver.)

The driver can now handle FM floppies as well as MFM floppies.  The
NE765 commands in ne765.h have been redefined to no longer make
implicit assumptions about MFM, SK, and MT; the driver adds those bits
as required.

The entire density handling subsystem with its bloated list of
different possible media density and its twisted decisions about which
drive can handle what has been completely moved out of the kernel,
into fdcontrol(8) where it should have been all the time.  It's very
simple, the basic device (four low order bits equal 0) is the
subdevice with automagic density selection, and subdevices 1 through
15 are available for fixed density assignments.  By default, all of
the latter are being initialized to the drive's maximum possible
density.  It's up to fdcontrol(8) to customize those subdevices.
Subdevice naming has been liberalized; subdevices `a' through `h' are
still implemented as pseudo-partitions which just cause a symlink to
be created to the master device.  Subdevices of the form . can be
arbitrarily created, where . means between 1 and 4 digits.  So you
can now either create your subdevices like fd0.1 through fd0.15, or
you can still name them by density (as it used to be now, but with a
fixed set of allowable names), as in fd1.800 etc.

Automagic density selection honors the unit attention bit
(aka. changeline signal).  It is checked in Fdopen(), and if it is
set, the autoselection is started.  If you re-open the same device
without changing media, no new autoselection is initiated.

Some old hacks have been made right, now that the infrastructure seems
to be in place.  There's no longer a configuration flag to fdc(4) that
tells there should be a single 1.44 MB drive assumed without probing.
On non-i386 architectures, there's no longer an assumption that all
the world is an 1.44 MB drive; on i386 architectures there's no longer
the assumption that you could only have two drives that need to be
mentioned in the RTC (although you need more than one controller for
more drives, and i haven't tested that yet).  Drive configuration is
now handled through device hints and a `flags' value per drive: the
lower 4 bits define the drive type (like in i386 RTC, but shifted
right by 4), bit 0x10 disables changeline support, bit 0x20 forces the
device to probe successfully without making a seek test first.  On
i386 architectures, if the device type hasn't been set by the flags,
and the fd unit number is 0 or 1, the CMOS RTS is still queried as it
used to be (but you could now override this).  (Detection of the i386
architecture has been changed to test for _MACHINE_ARCH instead of
testing whether we are compiling on __i386__.)

O_NONBLOCK handling has been added to support formatting on a device
that is normally using density autoselection.  Obviously, you cannot
autoselect the density of an unformatted medium :), so O_NONBLOCK
seemed to be the best way out.  Only a limited subset of ioctls is
possible in nonblocking mode, then you need to clear the flags with
fcntl in order to perform actual IO.

Some things in the density structure have been changed: there
certainly won't be more than two steps between cylinder, so we can
singly fold this feature into a flag bit, where the remaining bits can
be used otherwise (currently for MFM vs. FM, and for perpendicular
recording which is needed for 2.88 media).  There's now a field that
allows to specify an offset for the sector numbers on side 2; some
very old media used to be formatted that way (and Bruce Evans still
has some :).

A number of further ioctls have been added to obtain driver and device
information.  Used in fdcontrol and fdformat.

fdcontrol(8) has been heavily rewritten.  Densities can be specified
in kilobytes, and will then be selected from per-drive type tables, or
they can be specified as a feature string (currently only explained in
fdsupport.c).  A 

Re: fatal flaw in diskcheckd...

2001-07-18 Thread Matthew Jacob


Still on by default.


On Wed, 18 Jul 2001, Alfred Perlstein wrote:

 * Matthew Jacob [EMAIL PROTECTED] [010718 16:33] wrote:
  
  So, I took a SCSi disk away. Diskcheckd started complaining. However,
  a camcontrol rescan couldn't make the disk go away until I killed off
  diskcheckd, which then closed the disk, allowing the rescan to remove it.
  Bad. Bad. Bad.
  
  ev/da4
  Jul 18 14:31:15 diskcheckd[202]: error reading 512 bytes from sector 908502 on
  /dev/da4
  Jul 18 14:31:15 diskcheckd[202]: error reading 512 bytes from sector 908505 on
  /dev/da4
  Jul 18 14:31:15 diskcheckd[202]: error reading 512 bytes from sector 908506 on
  /dev/da4
  Jul 18 14:31:15 diskcheckd[202]: error reading 512 bytes from sector 908513 on
  /dev/da4
  Jul 18 14:31:51 diskcheckd[202]: error reading 512 bytes from sector 908629 on
  /dev/da4
  Jul 18 14:31:51 diskcheckd[202]: error reading 512 bytes from sector 908636 on
  /dev/da4
  (da4:isp3:0:5:0): Synchronize cache failed, status == 0x4a, scsi status == 0x0
  (da4:isp3:0:5:0): removing device entry
 
 Is diskcheckd still on by default?  If so, can whomever enabled it
 turn it off?  If not I'll be 'fixing' this oversight this afternoon.
 
 thanks,
 -- 
 -Alfred Perlstein [[EMAIL PROTECTED]]
 Ok, who wrote this damn function called '??'?
 And why do my programs keep crashing in it?
 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: fatal flaw in diskcheckd...

2001-07-18 Thread Alfred Perlstein

* Matthew Jacob [EMAIL PROTECTED] [010718 16:56] wrote:
 
 Still on by default.

In my queue then.

-- 
-Alfred Perlstein [[EMAIL PROTECTED]]
Ok, who wrote this damn function called '??'?
And why do my programs keep crashing in it?

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Load average synchronisation and phantom loads

2001-07-18 Thread Ian Dowse

In message [EMAIL PROTECTED], Bruce Ev
ans writes:
On Tue, 17 Jul 2001, Ian Dowse wrote:
 effect in the load calculation, but even for the shorter 5-minute
 timescale, this will average out to typically no more than a few
 percent (i.e the 5 minutes will instead normally be approx 4.8
 to 5.2 minutes). Apart from a marginally more wobbley xload display,
 this should not make any real difference.

It should average out to precisely the same as before if you haven't
changed the mean (mean = average :-).  The real difference may be
small, but I think it is an unnecessary regression.

I meant the 5-minute average that is computed; it will certainly
not be precicely the same as before, though it will be similar.

from 0 very fast.  Even with a large variation, the drift might not be
fast enough.

Actually, it's not too bad with a +-1 second variation, which is
why I chose a value that large. If you plot 60 samples (60 is the
number of 5-second intervals in the 5-minute load average timescale)
you get a relatively good dispersion of points throughout the
5-second interval. Try pasting the following into gnuplot a few
times:

 plot [] [-2.5:2.5] \
 perl -e 'for (1..60){$a+=4+rand()*2; $o=$a-5*int(($a+2.5)/5); \
 print \$o\n\}' t 1 second, \
 perl -e 'for (1..60){$a+=4.99+rand()*.02; $o=$a-5*int(($a+2.5)/5); \
 print \$o\n\}' t 0.01 second

It shows that while a +-1 second variation results in samples that
are usually scattered well across the 5-second interval, a +-1 tick
variation never changes the sampling point much during that time.
If you have a worst-case type load pattern such as that caused by

perl -e 'for(;;){while((time-1)%51){}select(undef,undef,undef,2.5)}'

(5-second period, 50% duty cycle) then the interference pattern
resulting from a +-1 tick variation has a period that is typically
days long! Of course the interference pattern caused by the above
script has an infinitely long period with the old load average
calculation; it always causes an additional load of 1.0 even though
the %CPU usage is approx 50%.

 The alternative that I considered was to sample the processes once
 during every 5-second interval, but to place the sampling point
 randomly over the interval. That probably results in a better

I rather like this.  With immediate update, It's almost equivalent to
your current method with a random variation of between -5 and 5 seconds
instead of between -1 and 1 seconds.  Your current method doesn't
really reduce the jitter -- it just concentrates it into a smaller
interval.

When I tried this approach (with immediate update), I didn't like
the jumpyness of the load average. Instead of the relatively smooth
decay that I'm used to, the way it sometimes changed twice in short
succession and sometimes did not change for nearly 10 seconds was
quite noticable. I'd be quite happy to go with the delayed version
of this, though it does mean having two timer routines, and storing
the `nrun' somewhere between samples and updates.

hopefully rare.  Use a (small) random variation to reduce phase effects
for such processes.  I think there are none in the kernel.  I would try
using the following magic numbers:

sample interval = 5.02 seconds (approx) (not 5.01, so that the random
 variation never gives a multiple
 of 1.00)
random variation = 0+-0.01 seconds (approx)
cexp[] = adjusted for 5.02 instead of 5.00

See above. I really want to try and avoid _any_ significant
synchronisation effects, not just those that are caused by the
kernel or by applications that happen to have a run pattern with
a period of N * 1.0 seconds.

Ian

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Userbase of -current

2001-07-18 Thread Garance A Drosihn

At 11:18 PM -0700 7/17/01, Peter Wemm wrote:
If I had to guess, I'd put the total [genuine] -current userbase
at between 20 and 50 people.  And many of those intentionally lag
by a few weeks to a month or two.

At the kernel-confab at usenix, I heard some people talking about
how current wasn't really as bad as people assume it is.  I must
admit I wonder how much current is actively used.  I know I try
to build a new up-to-date current every two or three weeks, but I
don't do much more on it than test a few changes.  I am certainly
not stress-testing it.  Almost all of my real day-to-day work is
done on machines which are tracking -stable.

I have no profound comment to follow that up with, other than I'm
surprised that someone would think there are only 50 people who
are really running current.  I'm going to ask around a bit more.

-- 
Garance Alistair Drosehn=   [EMAIL PROTECTED]
Senior Systems Programmer   or  [EMAIL PROTECTED]
Rensselaer Polytechnic Instituteor  [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Make world hosed ?

2001-07-18 Thread David Malone

 On Tue, Jul 17, 2001 at 07:55:18PM +0100, David Malone wrote:
  I suspect that this is my fault for not doing a buildworld after
  turning on WARNS stuff in inetd.

 YES!  Why are you committing these very easy to break the build, as
 we've seen changes w/o full `make buildworld' testing?!?

I should have been more careful, but I actually tested the change
on the i386 and alpha and checked that it didn't produce any code
changes. Unfortunately, gcc has an undocumented feature of ignoring
some warnings in system C header files. (Maybe this feature has
been there for years, but the fact that gcc gives out about system
header files is something that's caused problems for me before.)

I would have thought that any file included with

#include ...

would count as a system header file, but it seems gcc has some
other criteron for deciding. I've managed to trace it back to cpp
writing out lines like:

# 1 /usr/include/tcpd.h 1 3

where the 3 at the end seems to mean a system header file. And
in tradcpp.c it seems to set a varible system_header_p if the
include is a ... as opposed to a ..., but I haven't found out
where the 3 comes from yet.

Ahh - I'm looking at the wrong gcc sources. The 2.95.3 sources
(which uses the old gcc cpp) decides if something is a system
include based on examining a list which doesn't seem to get
initialised if you say -nostdinc. The newer gcc sources (2.96.2711
with the new cpp) seem to do the ... vs. ... thing.

David.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Load average synchronisation and phantom loads

2001-07-18 Thread Bruce Evans

On Tue, 17 Jul 2001, Ian Dowse wrote:

 In message [EMAIL PROTECTED], Bruce Ev
 ans writes:
 
 I think that is far too much variation.  5 seconds is hard-coded into
 the computation of the load average (constants in cexp[]), so even a
 variation of +-1 ticks breaks the computation slightly.
 
 I have not changed the mean inter-sample time from 5 seconds (*),
 so is this really a problem? There will be a slight time-warping
 effect in the load calculation, but even for the shorter 5-minute
 timescale, this will average out to typically no more than a few
 percent (i.e the 5 minutes will instead normally be approx 4.8
 to 5.2 minutes). Apart from a marginally more wobbley xload display,
 this should not make any real difference.

It should average out to precisely the same as before if you haven't
changed the mean (mean = average :-).  The real difference may be
small, but I think it is an unnecessary regression.

 If the variation was much smaller than it is in the proposed patch,
 you could get a noticable drifting in and out of phase with processes
 that have a regular run-pause pattern. Obviously this is a much

No, phase matches will be very rare in practice no matter what the
random variation is.  The average difference between the actual sampling
time for the N'th sample and (5 * N) will drift away from 0.  I think
the average of the square of this difference is proportional to N
provided the variation is uniformly distributed.  The main problem
with a small variation is that the average difference won't drift away
from 0 very fast.  Even with a large variation, the drift might not be
fast enough.

 The alternative that I considered was to sample the processes once
 during every 5-second interval, but to place the sampling point
 randomly over the interval. That probably results in a better
 synchronisation-avoidance behaviour. However, to incorporate the
 sample into the load average requires either waiting until the end
 of the interval, or updating the load average at the time of
 sampling. The former introduces a new delay into the load average
 computation, and the latter results in a lot of very noticable
 jitter on the inter-sample interval.

I rather like this.  With immediate update, It's almost equivalent to
your current method with a random variation of between -5 and 5 seconds
instead of between -1 and 1 seconds.  Your current method doesn't
really reduce the jitter -- it just concentrates it into a smaller
interval.

 (*) Actually, I have changed the mean by 0.5 ticks, but that is a
 bug that I will fix. The 4 + random() % (hz * 2) should be 4 +
 random() % (hz * 2 + 1) instead.

I think we can do better by making the bug a feature and using a sample
time difference of slightly more than 5 seconds, e.g. 5.01 seconds.
Then processes that wake up every second and run for less than 1 tick
would be in phase precisely every 500 or 501 seconds, which is good
(we want them to be in phase in a uniform way so that they get counted).
Processes that wake up ever 1.002 seconds would then be in phase too
much, but 1.002 is much less magic than 1.000, so such processes are
hopefully rare.  Use a (small) random variation to reduce phase effects
for such processes.  I think there are none in the kernel.  I would try
using the following magic numbers:

sample interval = 5.02 seconds (approx) (not 5.01, so that the random
 variation never gives a multiple
 of 1.00)
random variation = 0+-0.01 seconds (approx)
cexp[] = adjusted for 5.02 instead of 5.00

Note: sleep(), select() and non-periodic setitimer()'s also add 1 to the
timeout.  This should help reduce phase effects in userland.

 Not another SYSINIT (all SYSINITs are evil IMO).  SI_SUB_PSEUDO is
 bogus here -- there are no pseudo ttys here.  sched_setup() is a
 good place to do this initialization.
 
 John Baldwin suggested moving the load average calculation into
 kern_synch.c, so it would certainly make sense to initialise it
 from sched_setup() then. This seems like a good idea to me; does
 that sound OK?

Yes.

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Make world hosed ?

2001-07-18 Thread Bruce Evans

On Wed, 18 Jul 2001, David Malone wrote:

 I would have thought that any file included with
 
 #include ...
 
 would count as a system header file, but it seems gcc has some
 other criteron for deciding. I've managed to trace it back to cpp
 writing out lines like:
 
 # 1 /usr/include/tcpd.h 1 3
 
 where the 3 at the end seems to mean a system header file. And
 in tradcpp.c it seems to set a varible system_header_p if the
 include is a ... as opposed to a ..., but I haven't found out
 where the 3 comes from yet.

 
 Ahh - I'm looking at the wrong gcc sources. The 2.95.3 sources
 (which uses the old gcc cpp) decides if something is a system
 include based on examining a list which doesn't seem to get
 initialised if you say -nostdinc. The newer gcc sources (2.96.2711
 with the new cpp) seem to do the ... vs. ... thing.

I thought that it just looks at the path prefix and knows that /usr/include
is special.  It seems to used -nostdinc too.  I don't see how looking at
... could be right, since double-quoted includes are not wrong for
standard headers.  In practice, ``#include tcpd.h'' gives the same lack
of warnings as ``#include tcpd.h''.

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



sys/dev/snp?

2001-07-18 Thread Dima Dorfman

What do people think of moving sys/kern/tty_snoop.c to
sys/dev/snp/snp.c?  It doesn't belong in kern/; I'm guessing it was
put there originally because it was dependent on some custom hacks in
tty.c, but since those are gone I think there's no reason not to put
it where it belongs.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Problem with 20010716 -current

2001-07-18 Thread Aloha Guy

Greetings all:

I was originally running 20010618 -CURRENT and all
worked well.  So I went and cvsup to the latest 
-CURRENT on 20010716 and I did a make buildworld,
this process died with something about include.h so I
continued with just make in /usr/src and it finished 
so I went to make buildkernel and then make
installkernel and then I did the make installworld.
Everything was still working and then after rebooting,
this is what happens...

telnetd, sshd would core dump... only
/usr/local/sbin/sshd would work and everything else as
well... But after a hour or so, the system
wouldn't allow any new processes at all.  All inetd
processes would open and close immediately while on
the shell, anything I type would return:
no more processes

Does anyone know what I did wrong and how I can fix
this?  Would cvsup to a newer tree and doing a make
buildworld again fix things or should I do the
buildworld, buildkernel, installkernel, installworld
all in single user mode?  Thanks.

AlohaGuy

__
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail
http://personal.mail.yahoo.com/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



-current buildkernel fails on custom kernel

2001-07-18 Thread Vincent Poy

linking kernel.debug
linprocfs.o: In function `_linprocfs_mount':
/usr/src/sys/compat/linprocfs/linprocfs.c:748: undefined reference to
`pfs_mount'
linprocfs.o: In function `_linprocfs_init':
/usr/src/sys/compat/linprocfs/linprocfs.c:748: undefined reference to
`pfs_init'
linprocfs.o: In function `_linprocfs_uninit':
/usr/src/sys/compat/linprocfs/linprocfs.c(.text+0x1108): undefined
reference to `pfs_uninit'
linprocfs.o: In function `linprocfs_doprocstat':
/usr/src/sys/compat/linprocfs/linprocfs.c(.data+0x6c8): undefined
reference to `pfs_unmount'
/usr/src/sys/compat/linprocfs/linprocfs.c(.data+0x6cc): undefined
reference to `pfs_root'
/usr/src/sys/compat/linprocfs/linprocfs.c(.data+0x6d4): undefined
reference to `pfs_statfs'
*** Error code 1

Stop in /usr/obj/usr/src/sys/PELE.
*** Error code 1

Stop in /usr/src.
*** Error code 1

Stop in /usr/src.
root@pele [8:02pm][/usr/src] 


Cheers,
Vince - [EMAIL PROTECTED] - Vice President    __ 
Unix Networking Operations - FreeBSD-Real Unix for Free / / / / |  / |[__  ]
WurldLink Corporation  / / / /  | /  | __] ]
San Francisco - Honolulu - Hong Kong  / / / / / |/ / | __] ]
HongKong Stars/Gravis UltraSound Mailing Lists Admin /_/_/_/_/|___/|_|[]
Almighty1@IRC - oahu.DAL.NET Hawaii's DALnet IRC Network Server Admin


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message