from:"Stephen McKay"

Re: Enhancing the user experience with tcsh

2012-02-10 Thread Stephen McKay

On Friday, 10th February 2012, Eitan Adler wrote:

-alias la   ls -a
+alias la   ls -aF
 alias lf   ls -FA
-alias ll   ls -lA
+alias ll   ls -lAF
+alias ls   ls -F

Two people didn't like these changes but didn't explain why. This is
incredibly helpful, especially for a new user.  If you dislike the
alias change please explain what bothers you about it?

You should never, ever alias over a standard command in a default profile.
It will only train new users incorrectly.  Having to use \ls to get the
real ls is not an answer.  If you think -F should be the default behaviour
of ls, commit it directly to the ls source.  Then run away fast! :-)

As for the other ls aliases, I don't see the point given lf already
exists.  My only advice for your overall .cshrc changes is to be minimal
and aim low.  You may have a chance at consensus then.  Good luck!

By the way, one of the nice things about FreeBSD vs Linux is that less
shell configuration is set up by default, so less work is needed to
undo it all before you can get your own settings done.  Every helpful
thing that is set in /.cshrc or any other global config file is something
someone somewhere will have to discover and turn off.  Try not to make
it too hard for them.

Stephen.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: cvs commit: src/sys/pci if_dc.c

2002-09-22 Thread Stephen McKay


On Friday, 20th September 2002, Martin Blapp wrote:

I think we would have to test all cases with all cards. What cards
do you have Stephen, with which clone Chipsets ? Can you make a list
of them ?

I've only got DE500 (genuine Intel 21143) and Macronix 98715AEC cards.
Nothing PCMCIA or CardBus.  Not a very big selection, I know.  A lot
of us will have to band together to test changes.

I've got somewhere another dc card which made problems. I guess
it was PNIC.

PNIC is still a problem with the -current driver.

Stephen.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message

Re: cvs commit: src/sys/pci if_dc.c

2002-09-22 Thread Stephen McKay


On Friday, 20th September 2002, Martin Blapp wrote:

mbr 2002/09/20 08:18:13 PDT

  Modified files:
sys/pci  if_dc.c 
  Log:
  Fix the support for the AN985/983 chips, which do not set the
  RXSTATE to STOPPED, but to WAIT. This should fix hangs which
  could only be solved by replugging the cable.

John's already mentioned we are still thinking about the right way to
handle this but...

  MFC after:  2 weeks

... I thought I should explicitly mention that merging this particular
change as it stands is a bad idea because PNIC and Davicom cards (at least)
are not yet correctly handled.  The code in -stable is the old broken but
apparently harmless code.  This new code is attempting to be more correct
but breaks support for some cards.  Odd situation, no?

Stephen.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message

Re: dc(4) patch

2002-09-20 Thread Stephen McKay


On Thursday, 19th September 2002, John Baldwin wrote:

--- if_dc.c 4 Sep 2002 18:14:17 -   1.77
+++ if_dc.c 19 Sep 2002 20:57:03 -
@@ -1366,7 +1370,8 @@
for (i = 0; i  DC_TIMEOUT; i++) {
isr = CSR_READ_4(sc, DC_ISR);
if (isr  DC_ISR_TX_IDLE 
-   (isr  DC_ISR_RX_STATE) == DC_RXSTATE_STOPPED)
+   ((isr  DC_ISR_RX_STATE) == DC_RXSTATE_STOPPED ||
+(isr  DC_ISR_RX_STATE) == DC_RXSTATE_WAIT))
break;
DELAY(10);
}

Sadly this change is insufficient to satisfy all cards.
 
The PNIC 82c169 does not idle the transmitter (stays in DC_TXSTATE_WAITEND),   
though the receiver goes idle OK.  The Davicom DM9102 does not idle the 
receiver when asked (seems to get stuck in DC_RXSTATE_ENDCHECK) though it 
stops the transmitter OK.  Your card does yet another thing.
 
I know these things through 3rd party reports, not because I have any
hardware to test.
 
So at this point I think the best idea is to do the checks only on Intel
hardware.  At least I can verify that works on a real card I can see with
my own eyes.
 
Another valid option is to send me one of every dc(4) supported card,
except genuine Intel and the Macronix 98715AEC.
   
Stephen.

PS The Intel manual says that one should check bit 8, not the receiver
state bits, to see if the receiver is idle.  That makes the test:

(isr  DC_ISR_TX_IDLE  isr  DC_ISR_RX_READ)

It doesn't help though since the uncooperative cards don't set that bit
either.  Also, I think DC_ISR_RX_READ should be spelled as DC_ISR_RX_IDLE.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message

Re: dc(4) patch

2002-09-20 Thread Stephen McKay


On Friday, 20th September 2002, John Baldwin wrote:

On 20-Sep-2002 Stephen McKay wrote:
 Sadly this change is insufficient to satisfy all cards.

Well.  I think we can keep the check for TX going idle and just not do
the check for RX going idle.  The original code basically did this until
you submitted a patch to wpaul@ that fixed a logic bug (used || above
instead of ) that effectively didn't do the RX idle check.

Not quite.  Davicom cards (and your card) fail to idle the receiver.
PNIC cards fail to idle the transmitter.  So it makes just as much
sense as any other idea to check those bits only on cards that document
that you have to check those bits.  My documentation only covers Intel. :-)

Perhaps we should do the same here?  This would be similar to what we do in
dc_tx_underrun() where we only make sure the TX is idle.

Except that the documentation states you have to idle the TX and RX to
change the full duplex bit, whereas you only have to idle the TX to
change the transmit fifo threshold.  And in dc_tx_underrun() only
the genuine Intel chips are treated specially.  Clones seem to work
without idling the transmitter.  Except the poor Davicom, which gets
reset on every underrun (if anyone has one, and it gets underruns, you
could try including it with the DC_IS_INTEL(sc) case and see what happens).

Stephen.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message

Re: dc(4) patch

2002-09-20 Thread Stephen McKay


On Friday, 20th September 2002, John Baldwin wrote:

On 20-Sep-2002 Stephen McKay wrote:
 Not quite.  Davicom cards (and your card) fail to idle the receiver.
 PNIC cards fail to idle the transmitter.  So it makes just as much
 sense as any other idea to check those bits only on cards that document
 that you have to check those bits.  My documentation only covers Intel. :-)

Hmm, what if we went back then to waiting until at least one of either
TX or RX went idle?  Did only waiting for one actually break any 21143
cards?

Well that's the funny thing.  It's documented to be necessary on Intel
21143 chips, but I've never seen a non-zero delay between asking for
the TX and RX to idle, and observing them to be idle.  So we could
probably delete the test-and-delay loop entirely.

Waiting for just one of them to go idle, like we have in -stable, is just
silly.  Would you test for condition A and assume that means B is OK in
any other part of the kernel?  It's really hoping that idling the TX and RX
take about the same time when there's no reason to believe that.  I think
the test in -stable is pretty much equivalent to having no test at all.

The only solid documentation I've got demands *both* must be idle.  But
that's from Intel and describes the original chips.  Hence, my view that
we should test the bits on Intel chips and forget about it on the clones.
Clones tend not to bother implementing all the limitations of the original
anyway.  If we find a clone that turns out to need the tests, we can enable
them for that clone too.

Stephen.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message

Re: Uncommitted dc0 fixes ...

2002-09-09 Thread Stephen McKay


On Wednesday, 4th September 2002, Martin Blapp wrote:

And this patch here together with patch III made the annoying messages (dc0:
failed to force tx and rx to idle mode) go away. And I can use now my card
without to replug the cable over again)

I've been meaning to remove the annoying message for ages.  Sorry about that.

+   if (DC_IS_INTEL(sc)) {
+   for (i = 0; i  DC_TIMEOUT; i++) {
+   isr = CSR_READ_4(sc, DC_ISR);
+   if (isr  DC_ISR_TX_IDLE 
+   (isr  DC_ISR_RX_STATE)
+   == DC_RXSTATE_STOPPED)
+   break;
+   DELAY(10);
+   }
+   }

Conditionalising on DC_IS_INTEL() means most cards no longer wait until
the TX and RX are idle.  I don't have enough different if_dc cards to
know if this is safe.

On the other hand, every test I've done on my Intel and Macronix cards
shows zero calls to DELAY() in this loop.  The loop may as well not be
there for those card types.

Indeed, it isn't there at all in if_de and in a Linux driver I looked
at.  From this I'm guessing that no 21143 (real or clone) needs this check,
though I've got no real proof.

Out of all this fuzzy evidence, I guess the most sensible option is the
patch you've proposed.  If nobody else is interested, I'll commit this part
of your patch cluster on the weekend.  I suppose I could do the ADMtek
auto tx underrun recover patch too, as it seems harmless to other cards.
The other stuff I can't test at all.

This driver represents a counterintuitive state of affairs.  I was impressed
when Bill Paul managed to support so many clone cards with one driver.  But
now nobody has enough hardware on hand to test any change properly.  There's
some sort of lesson to be learnt here.

Stephen.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message

Re: if_dc broken in -current

2002-03-26 Thread Stephen McKay


On Monday, 25th March 2002, Ilmar S. Habibulin wrote:

On Mon, 25 Mar 2002, Stephen McKay wrote:

 What sort of card do you have?  The output of dmesg would help.  Have you
 tried 4.5 on this machine?
I have some noname nic with Intel 21143 chip. dmesg attached. I'm using
only trustedbsd_mac branch on my ws.

Yours seems to be the same as mine (from a chip and phy point of view)
although mine has a DEC assigned ethernet address and yours is from
Telebit.  I don't think that difference matters.

 Of course the dc driver should autonegotiate (and does so when I revert
 rev 1.56).  Your info could help trace this problem.
Well, i don't think this is the problem. Hardware became too much
inteligent now a days, so one have to use his own hands to make this
hardware work like user wants it to work. Maybe just put some FAQ about
dc(4) and autoconfigurable hubs/switches?

Some things can be blamed on attempted intelligence gone wrong.  But not
this one.  This is a simple bug.  My card works perfectly under 4.5.0
on the same machine.  It fails with -current.  But with one change
reverted, it works again.  Now all I have to do is work out what is
the real underlying cause, since the current code looks right at first
glance.  At least I have the old DEC datasheets, and some info on some
of the clones.

Stephen.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message

Re: if_dc broken in -current

2002-03-26 Thread Stephen McKay


On Monday, 25th March 2002, Robert Watson wrote:

I think I have an identical problem involving a Linksys ethernet card
using if_dc.  I have to force it to negotiate 10mbps, since it fails to
negotiate anything higher with my 10/100 switch.  No idea why at all.

dc0: LC82C115 PNIC II 10/100BaseTX port 0xe800-0xe8ff mem
0xfebfff00-0xfebf irq 10 at device 19.0 on pci0
dc0: Ethernet address: 00:a0:cc:35:3e:56
miibus0: MII bus on dc0
dcphy0: Intel 21143 NWAY media interface on miibus0
dcphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto

dc0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
inet6 fe80::2a0:ccff:fe35:3e56%dc0 prefixlen 64 scopeid 0x1 
inet 192.168.11.150 netmask 0xff00 broadcast 192.168.11.255
ether 00:a0:cc:35:3e:56 
media: Ethernet 10baseT/UTP
status: active

If I set it to auto-negotiate or hard-set to 100mbps, no packets go back
or forth.  I've had this problem for at least a year, if not longer.  I
have the same problem with 4.4-STABLE using an identical card on different
hardware: if it tries to negotiate 100mbps, then it simply doesn't work.
If I force it to 10, it's fine.

After careful consideration, I think this has to be a different problem.

My problem is that auto-negotiation doesn't start at boot (when an address
is assigned to dc0).  If I explicitly set a speed, that speed works.  Most
bizarrely, if I misspell the media option, that causes a successful
autonegotation!  I mean, I type ifconfig dc0 media 10baset immediately
after boot, and autonegotiation takes over.  (If I spell it 10baset/utp
it goes into 10Mbit half-duplex mode, like you expect.)  So it's just a
hair's breadth away from working properly, and reverting rev 1.56 is enough
for full operation to be restored.

Since you explicitly set 100Mbit half-duplex and it doesn't work, then that
must be something else.  We could have a go at finding that bug too, but
it will be harder, since I don't have a PNIC II here.  I do have some info
on the Macronix 98715A, which Bill Paul says is almost the same.  Maybe
we can get lucky.

Stephen.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message

Re: if_dc broken in -current

2002-03-25 Thread Stephen McKay


On Friday, 22nd March 2002, Ilmar S. Habibulin wrote:

On Sat, 23 Mar 2002, Stephen McKay wrote:

 It's been quite a while since I updated my -current box, but when I did,
 I was surprised to find that my DE500 network card (21143 chip) had stopped
 working.  The switch showed no link.  Ifconfig showed no carrier.

I've had the simular problem. Now i have media option set to needed value
in ifconfig_dc0 variable. This helped.

What sort of card do you have?  The output of dmesg would help.  Have you
tried 4.5 on this machine?

Of course the dc driver should autonegotiate (and does so when I revert
rev 1.56).  Your info could help trace this problem.

Stephen.

PS I'm now assuming the number of -current users that use PNIC and Davicom
cards with the dc driver is exactly zero.  Oh well.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message

if_dc broken in -current

2002-03-22 Thread Stephen McKay


It's been quite a while since I updated my -current box, but when I did,
I was surprised to find that my DE500 network card (21143 chip) had stopped
working.  The switch showed no link.  Ifconfig showed no carrier.

After some fiddling, I reverted revision 1.56 (removal of mii_pollstat call)
of sys/pci/if_dc.c and the DE500 went back to normal.  It auto-negotiated
100Mbit full duplex, and now works fine.

I expect the problem is actually in mii/dcphy.c but since I have very little
understand of how this mii stuff is supposed to work, I have to leave that
to others.  If no one is available to give me a hand here, I'll have to
go with plan B which is to simply back out rev 1.56 of if_dc.c.  (That's
not such a bad plan really, just slightly inefficient.)

On a different dc driver note, I'm interested in knowing if anyone is using
either a PNIC or Davicom with -current.  There is a slight difference between
-current and -stable, and the code in -current caused problems with PNIC and
Davicom cards when it was briefly in -stable.  I'm assuming that nobody is
using such cards, and the little bit of code is going to annoy a few people
when they try the 5.0 prerelease.  I'd like to fix this before it causes
too much trouble.

For those who are curious, the troublesome piece of code is lines 1339 and
1340 (in rev 1.69):

if (isr  DC_ISR_TX_IDLE 
(isr  DC_ISR_RX_STATE) == DC_RXSTATE_STOPPED)

which waits for confirmation that the transmitter and receiver are both
idle before some configuration registers are fiddled with.  With PNIC
and Davicom cards, one or the other of these conditions never occurs.
Or at least that was the trouble when this was in -stable, back in August.
Could this problem have magically gone away?

Stephen.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message

Re: Another tweak to burncd msinfo

2002-01-05 Thread Stephen McKay


On Saturday, 5th January 2002, Søren Schmidt wrote:

It seems Stephen McKay wrote:
 Now that burncd msinfo returns the correct values I noticed another small
 problem: it displays the result on stderr instead of stdout.

Hmm, that was intentional...

Could you explain why?  The most obvious practical use would be:

$ mkisofs -r -C `burncd msinfo` -M /dev/acd0c -o new.iso goodies

Writing to stderr means this doesn't work, and you have to add 21 to it.
Also the white space means you have to use extra quoting.

 Can I commit the obvious patch?

Could you just hang on for now, since I'm doing large changes to
burncd just now in order to support other things, and keeping
everybody changes to the stock sources is not making things 
easier...

Are these changes intended for 4.5?  I'm hoping the small change I
proposed would be accepted into 4.5, before anybody starts using
burncd msinfo in practice.  I think this is sensible, even if
a much improved burncd is scheduled for 4.6.

Regardless of this, I do not intend to commit any unwelcome changes.

Stephen.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message

Re: Another tweak to burncd msinfo

2002-01-05 Thread Stephen McKay


On Saturday, 5th January 2002, Søren Schmidt wrote:

It seems Stephen McKay wrote:
 
 Are these changes intended for 4.5?  I'm hoping the small change I
 proposed would be accepted into 4.5, before anybody starts using
 burncd msinfo in practice.  I think this is sensible, even if
 a much improved burncd is scheduled for 4.6.

You should ask permission from the release engineer to commit it
to 4.5, but it really should be committed to -current first.

Of course!  But given how simple the change is, just a couple of days
in -current would be sufficient testing.  I am asking your approval
to commit to -current, then I'll ask the REs about -stable.

Does this mean you've decided that it is a beneficial change and won't
intefere with your other work?

Stephen.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message

Re: Another tweak to burncd msinfo

2002-01-05 Thread Stephen McKay


On Saturday, 5th January 2002, Søren Schmidt wrote:

I forgot to say that I already committed the change to current...

:-)

I try to keep up with -current, but that's too current for me!

I'll hassle the REs tomorrow about permission to merge.

Thanks,

Stephen.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message

Fix for broken burncd msinfo PR#27593

2001-12-25 Thread Stephen McKay


A number of people have complained that burncd msinfo returns the wrong
value when there are already multiple sessions on a CD.  This is true,
and is bug bin/27593.

Since I burn a lot of multisession CDs, and have been working out the mkisofs
-C values by hand with the help of cdcontrol info, I thought now would be a
good time to fix this bug.

Unfortunately, I've found that burncd won't work with SCSI burners, and
the only ATAPI burner I have is at work, and well, it's Christmas and all
that.  So this is completely untested, though I believe it should work.

I hope this can make it into 4.5.

Stephen.

PS How much work would it be to add the CDRIO* ioctls to the SCSI cd driver?


Index: burncd.c
===
RCS file: /cvs/src/usr.sbin/burncd/burncd.c,v
retrieving revision 1.19
diff -u -r1.19 burncd.c
--- burncd.c2001/12/24 03:20:10 1.19
+++ burncd.c2001/12/25 13:45:48
@@ -149,10 +149,14 @@
break;
}
if (!strcasecmp(argv[arg], msinfo)) {
+   struct ioc_toc_header header;
struct ioc_read_toc_single_entry entry;
 
+   if (ioctl(fd, CDIOREADTOCHEADER, header)  0)
+   err(EX_IOERR, ioctl(CDIOREADTOCHEADER));
bzero(entry, sizeof(struct ioc_read_toc_single_entry));
entry.address_format = CD_LBA_FORMAT;
+   entry.track = header.ending_track;
if (ioctl(fd, CDIOREADTOCENTRY, entry)  0) 
err(EX_IOERR, ioctl(CDIOREADTOCENTRY));
if (ioctl(fd, CDRIOCNEXTWRITEABLEADDR, addr)  0) 

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message

Implications of stdio changes (was Re: cvs commit: src/include stdio.h src/lib/libc Makefilesrc/lib)

2001-08-20 Thread Stephen McKay


On Tuesday, 14th August 2001, Daniel Eischen wrote:

  So do we allow FILE to be extended only after bumping the library
  version once (after 5.0-release)?  And thereafter all extensions to
  FILE do not need a version bump?
 
 We've already bumped libc for 5.x.  Assuming this works ok, we shouldn't need
 any further bumps for extending FILE.

True.  I guess the real problem is the other libraries that reference
stdin, stdout, stderr.  These need to be rebuilt with the new stdio.h
and libc in order to avoid any impact from future FILE changes.

I might sound like the harbinger of doom, but you have to bump the major
number on every library that uses stdio to solve the FILE has changed
size problem.  It's the same sort of problem that changing errno caused.
That was solved by the switch to elf, which caused global recompilation.

People are hoping to do this by just waiting.  Eventually most libraries
will experience a major version bump.  Similarly, most useful programs will
be recompiled (either against bumped libraries, or recompiled old ones).
But some programs will not be recompiled, and will fail in mysterious ways.
I often use really old binaries, so odds are it will happen to me. :-)

To prevent old binaries from going bad, the libraries they link to must
use the old version of stdio.  Definite ideas of the offset in __sF of
stdout and stderr are embeded in both the old programs, and the old
libraries (and of course, the old version of stdio).  If you recompile
the libraries against the new stdio, you break the old binaries.  The
solution is to not do that.

In short, when FILE changes size (and hence __sF offsets change), then
every consumer(*) of stdio must be bumped.  The recent __stdinp (and friends)
addition prevents this problem happening again in the future, but does not
solve the current problem of old binaries and old libraries knowing the
internals of stdio.

Stephen.

(*) OK, technically only uses of stdout and stderr variables screw up
when FILE changes size.  Uses of macros (like getc variants that are
sometimes macros) will screw up if offsets change, but that's easier
to avoid.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message

Re: Whatever happened to CTM?

2001-03-22 Thread Stephen McKay


On Thursday, 22nd March 2001, Bruce Evans wrote:

On Wed, 21 Mar 2001, Stephen McKay wrote:
 On the contrary, I prefer CTM over CVSup, even on a fast connection (which
 I don't currently have).  On a slow or intermittent connection, CTM beats
 CVSup by a large margin.

I'm not sure about that.  CTM may be faster, but it works less
automatically, especially when it breaks, and it breaks often, at both
the server and client levels (mainly downtime problems for the server
and disk-full problems for the client.  I used to use it until the
server broke one time too many last year.

CTM's advantages outweigh the disadvantages for me.  I don't run out of
disk space(*), and the server failures have been rare.  Certainly, the 
reliability
of CTM delivery exceeded the reliability of all of the M$ systems the guys
in the neighbouring cubicles managed at my previous employer.  Until now,
of course.

What we need now is someone to supply hardware and some connectivity.  I still
think CTM has sufficient advantages to justify its continued existence.

I think the project should fund it.

Stephen.

(*) The tangle you get in after ctm croaks from lack of disk space were 
supposed
to have been fixed.  I don't think they have been.  It shouldn't be too 
difficult
though.  All those md5 checksums make repairs trivial to automate, in theory.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Whatever happened to CTM?

2001-03-21 Thread Stephen McKay


On Tuesday, 20th March 2001, Ulf Zimmermann wrote:

On Mon, Mar 19, 2001 at 04:53:33PM -0800, John Baldwin wrote:
 
 On 20-Mar-01 Michael C . Wu wrote:
  For all connections greater than 9600baud modems, we recommend
  using CVSup to get src-all and ports-all updated. At the worst case, 
  be able to CVSup a ports-all collection within an hour, with heavy
  packet loss and low bandwidth.
  
  i.e. CTM sucks, don't use it. :)

On the contrary, I prefer CTM over CVSup, even on a fast connection (which
I don't currently have).  On a slow or intermittent connection, CTM beats
CVSup by a large margin.

 cvsup is not available via e-mail for those who may only have e-mail access
 for one reason or another.

Firewalls make CTM style delivery essential.  (No, Stefan, I don't like
your tunneling idea. :-)

I have been hosting the machine which ran ctm,

And many thanks indeed for your service!

unfortunatly my provider
cut me off and I just got some access back, but not for the location
the ctm machine is located at.

At this time I do not know yet when it will have access again.

Surely FreeBSD Inc (or whatever it is that owns the freebsd.org machines)
could spring for a box.  Assuming Ulf is still keen, it shouldn't be too
hard for him to remote administer it.

Stephen.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Fixing a.out compatibility

2000-12-26 Thread Stephen McKay


I'll try to summarise the position so far:

1) Legacy a.out executable support is broken for a subset (size unknown)
of such executables.

2) We can ignore this or repair this.

3) We can build a new binary or just look around on old 3.x CDs until
we find one that works.

4) We can generate a working binary on 4.x or on 2.2.8-stable (after some
fixing).

5) We can generate ld.so anew each release, or generate it (or find it)
once and commit a binary.

I don't think there's any doubt about point 1.  All a.out executables that
use libc.so.2.2 and another recompiled library will fail because of a
missing routine (__error) required by the recompiled library and not
supplied by libc or by executable or by the existing ld.so.

All these executables come from the 2.2.x era or earlier.  Those built in
the 3.x era use libc 3.1 and don't have this problem.  Urk...  Actually,
it's slightly more complicated than that since the libc.so.3.1 built on
2.2.6 (for example) didn't contain __error() but the one built on 2.2.7
did.  (At least according to the cvs logs).  I'm most annoyed that I can't
find my 2.2.6 CDs.  2.2.5 had libc 3.0 (without __error) and 2.2.7 had
libc 3.1 (with __error) but the cvs logs say that 2.2.6 should have had
a different libc 3.1 (without __error).  So, the exact "version" of
version 3.1 of libc could be important.  Yuck.

We don't normally ignore things we can fix, so point 2 is resolved in
favour of fixing this, right?

We need to build a new binary since we (collectively) have forgotten
where the working 3.0 through 3.2 binaries came from. :-(  Can we,
for example, prove that revision 1.57 made in into any release?

It seems feasable to generate a new binary on a recent or an old patched
FreeBSD version.  The question is which is better.  I think the newer
the better.  Otherwise, who is going to build the 2.2.8-stable box
to make this one binary?  I've already built a binary on 4.2-release
that works.

We disagree a bit over point 5.  I think it is feasable and desirable
to build ld.so at each release.  If we don't build it for each release,
how will fixes to rtld-aout and required libraries (eg libc) be incorporated?
I say keep building it fresh until a.out builds are impossible.  Or are
you suggesting that each advance in 4.x and beyond be backported to
2.2.8-stable so that we can build one binary?

So, where to from here?  Despite all my arguments, I could just commit
the binary I have to the lib/compat2* areas and leave it at that.

Stephen.

PS Thanks for all the "old_RELENG_2_2" etc tags now available in rtld-aout.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Fixing a.out compatibility

2000-12-26 Thread Stephen McKay


[Noted that you don't like being cc:'d, David.  On the other hand,
I like to be kept in the cc: list.]

On Tuesday, 26th December 2000, "David O'Brien" wrote:

On Wed, Dec 27, 2000 at 02:01:24AM +1000, Stephen McKay wrote:
 I'll try to summarise the position so far:
 
 1) Legacy a.out executable support is broken for a subset (size unknown)
 of such executables.

Define "legacy".  I have been speaking specifically about FreeBSD 2.2
support.  That just happens to a.out based.

You seem to mean it to be any a.out binary.

Not really.  If I generate an a.out binary right now, it can't suffer
from this problem, even though it uses ld.so when it runs.  Only a
certain set of old a.out binaries are affected.

From my standpoint only bits generated on a 2.2.x host can go into the
compat22 distribution.  When compat1x was created (being the first it
gets to imply the intention of the compat dists) it gave the ability to
run FreeBSD 1.x binaries; not 2.0 a.out ones, not any binary after the
last 1.x release.  Thus why I claim compat22 is *not* about being an
a.out compat dist, but one to properly run 2.2.x binaries on a later
version of FreeBSD.  If 3.0 had been a.out based, there still would be a
compat22 dist.

We almost completely agree, but...

The only a.out binaries with problems come from that 2.2/2.1 era.  To
support them we need an ld.so from *after* that era.  I can't see how you
get around this.  That working ld.so was in 3.0 and was certainly no
generated on a 2.2.x host.  I think your restriction on compat contents
is a useful guideline, but to be broken when necessary.

Thus someone that still has access to a 2.2.8-stable box needs to merge
the changes in src/libexec/rtld-aout (in -current) to
src/gnu/usr.bin/ld{,/rtld} and build a new binary for inclusion in the
compat22 dist.

I'll build one if I have to.  I'm trying to avoid unnecessary work, since
I expect there are few others bothered enough to fix this problem.

Note, when the bits were CVS repo copied into rtld-aout, all the tags
were stripped.  I spent the time to add them all back to make the merge
easier for someone.  Whoever does this should please CVSup before
starting.

Could very well be me.  But I would be patching the old location, surely?

 I don't think there's any doubt about point 1.  All a.out executables that
 use libc.so.2.2 and another recompiled library will fail because of a
 missing routine (__error) required by the recompiled library and not
 supplied by libc or by executable or by the existing ld.so.

Agreed, but "and another recompiled library", means this a.out
executable was not built on a 2.2.x host.  Otherwise there would be no
way to have this inconsistency.

This is the fundamental point of this problem.  The executable was built
on a 2.2.x or 2.1.x box and originally used libraries compiled then or
earlier.  The whole problem is the fact that libraries were recompiled
later and did not change version numbers.  There was no way to force
external parties to update version numbers, and folks round here didn't
feel like bumping all the FreeBSD library version numbers.

This is why I keep the words "executable" and "library" separate.  The
library is newer than the executable, and this causes the executable
to fail.  This is the fact that I'm not at all sure that you understand.

Actually one problem is I put the 2.2.8 ld.so in the compat2[01] dist.
That was wrong of me.  I can correct that.  SimCity (the binary used as
an example) required me to install the comapt20 and compat21 dists.  The
other problem is we don't have a compat2[01] XFree86 libs dist.  We only
have an a.out one that is intended to cover all a.out binaries, and it
doesn't correctly.

We can only install one ld.so.  It has to cover all bases.  Are you
suggesting that each compat2x dist install a different ld.so?  This
is consistent with your claim that "compat2x bits come from 2.x", but
not very useful in practice.  Should I assume you meant to delete
ld.so from all but one compatxx dist?

 2.2.5 had libc 3.0 (without __error) and 2.2.7 had libc 3.1 (with
 __error) but the cvs logs say that 2.2.6 should have had a different
 libc 3.1 (without __error).  So, the exact "version" of version 3.1 of
 libc could be important.  Yuck.

The compat22 dist used the 2.2.8 bits, so I don't see how it wasn't the
``exact "version" of version 3.1 of libc''.

What I was going on about here is that important changes occurred to
libraries without a version bump, and one such library was libc.  It
is making my attempt to describe the boundary of the problem very
difficult.  I can't predict what happens when you run an old a.out
binary linked against the "version" of version 3.1 of libc that didn't
have __error in it.  This sort of confusion is what the versioning
system was supposed to prevent.

 We need to build a new binary since we (collectively) have forgotten
 where the wor

Re: Is compatibility for old aout binaries broken?

2000-12-20 Thread Stephen McKay


On Wednesday, 20th December 2000, "David O'Brien" wrote:

On Mon, Dec 18, 2000 at 02:58:16AM +1000, Stephen McKay wrote:
 This has been broken for new users for some time. :-(  Those of us
 upgrading from source have been immune to this problem, because we
 retain the old a.out ld.so binary.
 
 /usr/libexec/ld.so: Undefined symbol "___error" called from
 sim:/usr/X11R6/lib /aout/libX11.so.6.1 at 0x20160644

 When errno became a function that returns a pointer (previously it was
 a simple integer variable), recompiled libraries became incompatable with
 old binaries.  So, I hacked the a.out loader (ld.so).  The fix was in 3.0.
 Well, Nate called it a horrible hack, so maybe I should say "the hack was
 in 3.0".

src/lib/libc/sys/__error.c suggests this was the case for 2.2.7+.

No, you want rev 1.10 of sys/sys/errno.h.  That was when it affected all
a.out binaries.  Until then it was just threaded binaries, a vanishingly
small proportion.  Rev 1.10 was in 3.0.  Rev 1.5 was in the 2.2.x releases.

What is out of sync is the X11 a.out libs.  They are probably built on a
2.2.7 or 2.2.8 box, thus they refer to `___error' vs. `errno'.  These
libs are wrong for the SimCity binary.  They are a.out yes, but not
proper for compat20 use.  Since SimCity needs `libgcc.so.261', I'll
assume it was built that long ago.

Correcting slightly for your slightly off assumption: The X11 libs were 
probably
built on a 3.x box.  Their problem is that being newer than libc.so.2.2 (or was
it libc.so.3.0) they use ___error but libc does not supply it.  My patches
to rtld-aout (that first appeared in FreeBSD 3.0) supply ___error in this case.
This is the only full fix for this situation.

The problem isn't as much ld.so, as it should match the libc.so, et.al.
you are using from the compat2[01] dist (needed to satisfy ``ldd
lib/SimCity/res/sim'').  And `ld.so' and the shared libs would be
consistent on the system the a.out program was built on.

There was an enormous thread in -current (I think) at the time (mid 1998).
The end result was that the ld.so hack was the only solution other than
mandating a major bump to every library in existence.  Nobody liked either
of those solutions :-) but I put the ld.so hack in and the problem disappeared.
Emphasis again: the workaround ld.so was only found in 3.0 and onward, so
just using a 2.2.x ld.so isn't enough.

What I would feel most comfortable with, is doing a MFC to RELENG_2_2 of
the rtld-aout changes since then, building a new `ld.so' and putting that
in the compat2? dists.  Problem is I don't have access to a 2.2-STABLE
box.

I have built a binary on 4.2-RELEASE.  I think I prefer that because any
security fixes in libc (or whatever) will be reflected in the resulting
ld.so.  In fact, I think we should build ld.so from source until such
time as a.out building capability is removed (5.0 perhaps).

On the other hand, merging back to 2.2.x and rebuilding should provide
a working (and hack enabled) ld.so that has no more problems than the
old binaries it is supporting.

 I poked about with my old FreeBSD CD collection and found that
 version 3.0 through 3.2 have a fully functioning (fully hack enabled)
 ld.so, but an older binary has been substituted in 3.3 and onward,
 including 4.0 and 4.1, and most likely 4.2 also.

Are you sure?  src/lib/compat/compat2[012]/ld.so.gz.uu are all at
rev 1.1.  So there has been no change to them over the lifetime of their
existence.  All three are identical -- having the same MD5 checksum.
Well, looking at the release tags compat22/ld.so was in 3.2.
compat2[01]/ld.so was added for 3.3.

This very fact is bothering me a lot.  Get out your 3.2 disks and verify
that they do not match these uuencoded binaries.  Check the 3.0 and 3.1
disk 2 (live file system) and see that they don't match them either.

 I can only guess that some anonymous release engineer (nobody we know :-)
 picked the wrong CD at some point to get the master copy of ld.so once
 it stopped compiling.  (Or at least stopped being easily compiled.)

Not quite.  I seem to remember that JKH was makeing a tarball of a.out
libs from what ever was on his box at the time (thus probably the last
a.out ld.so just before E-day on 3-CURRENT).

Something like this must have happened up to and including the 3.2 release.

When I committed the
compat2? bits, I took ld.so from a 2.2.x release as this is the compat2?
dist, not compat3.aout dist.  Which is what you're suggesting should have
been done.

You missed the fact that fixes were added to ld.so after those releases
even though the purpose of ld.so is to run binaries that date from those
releases.  The existence of later, recompiled libraries requires this.

Stephen.

PS In just a few hours, I'll be out of the picture for 4 or 5 days.
   I hope I've given you a complete understanding of the situation
   in the event that I don't get to commit anything.



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsub

Re: Is compatibility for old aout binaries broken?

2000-12-20 Thread Stephen McKay


On Wednesday, 20th December 2000, "Donald J . Maddox" wrote:
  Looks good.  Can you install the XFree896-aoutlib port?  You may have
  seen were someone posted the a.out libs from 3.3.6 are known to not be
  the the best to use for compatibility use.

Interesting.  After I installed the XFree86-aoutlibs port, SimCity
works fine for me (on an 8-bit display)...

It didn't work with the X libs built by the port when aout libs
are requested, and it didn't work with the X libs from 3.3.6, but
it works with these.

If the XFree896-aoutlib libraries are old enough, they will not call ___error.
That is sufficient to solve your particular problem, but not to solve the
general case.  

I'm now wondering if the reason that people don't like the XFree86 3.3.6
a.out libraries is the problem with ___error and the older ld.so supplied
with recent FreeBSD releases.

Stephen.



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Is compatibility for old aout binaries broken?

2000-12-20 Thread Stephen McKay


On Wednesday, 20th December 2000, "Donald J . Maddox" wrote:

On Wed, Dec 20, 2000 at 10:14:09AM -0800, David O'Brien wrote:
 On Wed, Dec 20, 2000 at 11:15:55PM +1000, Stephen McKay wrote:
  Correcting slightly for your slightly off assumption: The X11 libs were
  probably built on a 3.x box.  Their problem is that being newer than
  libc.so.2.2 (or was it libc.so.3.0) they use ___error but libc does not
  supply it.  My patches to rtld-aout (that first appeared in FreeBSD
  3.0) supply ___error in this case.  This is the only full fix for this
  situation.
 
 Why is not changing the XFree86-aoutlibs port to offer libs built on
 2.2.x not the right fix?

I was under the impression that this was already the case...  The libs
in the XFree86-aoutlibs port ARE from 2.2.x.  My problem was that I
was using libs built on 3.x.

(I think I can save a lot of typing by replying to this message.  I'm just
about to leave town.)

My whole point is that generating a.out binaries and libraries didn't stop
the instant that 3.0 hit the streets.  To support the mixture of old binary
plus new library you need a hacked ld.so.  We have to supply it somehow,
or simply say we don't care about certain binaries dying with obscure
error messages.  This XFree86-aoutlibs vs libs built on 3.x example supports
my theme.

I can't reconcile your naming convention (ie compat22 bits originated on
a 2.2.x box) with my version (compat22 is used to support 2.2.x binaries).

I'm also not afraid that a binary generated on 4.2 would have hidden
defects.  I'm more worried that one generated on 2.2.x would have defects
we've forgotten about.

If you don't mind pausing the whole argument for about 4 days, I can
rejoin.  :-)

Stephen.



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: No cable modems??

2000-12-19 Thread Stephen McKay


On Tuesday, 19th December 2000, "Donald J . Maddox" wrote:

Why are you (or your ISP) refusing to accept mail from people
with cable modems?  Enquiring minds want to know... ;-)

   - Transcript of session follows -
... while talking to frmug.org.:
 MAIL From:[EMAIL PROTECTED]
 550 no cable modems here
554 5.0.0 [EMAIL PROTECTED] Service unavailable

It's a spam reduction move.  I'm surprised hub.freebsd.org accepts your
mail!  You should funnel your mail through your ISP's central mail hub.

Followups to -chat, I think.

Stephen.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Is compatibility for old aout binaries broken?

2000-12-18 Thread Stephen McKay


On Monday, 18th December 2000, "Donald J . Maddox" wrote:

On Mon, Dec 18, 2000 at 04:41:17PM +1000, Stephen McKay wrote:
 
 I expected some build tool expert to say "Just compile with these
 options".  But they haven't.  So I'll see if the bits have rotted,
 or whether we can keep building ld.so instead of just including
 an age old binary.

Well, if you do manage to uncover the lost magic, please let me know :)

It's getting a little more magic every day to generate a.out stuff,
but not all that bad.  Basically I built lib/csu/i386, gnu/lib/libgcc,
lib/libc and libexec/rtld-aout, in order, with these settings:

NOMAN=yup DESTDIR="" OBJFORMAT=aout MAKEOBJDIRPREFIX=/usr/obj/aout

In each directory, I used make obj, make, make install.  (By the way,
there are a lot of twisty little passages in /usr/share/mk.  One of
them required me to add DESTDIR="", which should be a NOP.)

The generated ld.so has bloated a bit :-) but works fine.  So we could
in principle build ld.so for every release.  It's just a question of
whether we should.  I think we should.  But it might be just as easy
to copy it off the 3.3 CD every time.  It's dead end stuff after all.

Does the release engineer have an opinion?

Stephen.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Is compatibility for old aout binaries broken?

2000-12-18 Thread Stephen McKay


On Tuesday, 19th December 2000, Stephen McKay wrote:

But it might be just as easy to copy it off the 3.3 CD every time.

Oops!  As I wrote earlier, 3.3 and onward have the broken ld.so.  Good
copies are found on 3.0 though to 3.2.

Sorry for veering off the road there. :-)

Stephen.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Is compatibility for old aout binaries broken?

2000-12-18 Thread Stephen McKay


On Monday, 18th December 2000, Jordan Hubbard wrote:

 The generated ld.so has bloated a bit :-) but works fine.  So we could
 in principle build ld.so for every release.  It's just a question of
 whether we should.  I think we should.  But it might be just as easy
 to copy it off the 3.3 CD every time.  It's dead end stuff after all.
 
 Does the release engineer have an opinion?

If it's just for the compat3x distribution, I say check it into that
part of lib/compat and be done with it.  Uudecoding it each time is a
lot easier than building it.  Or are we talking about ld.so in some
different context?

I hadn't noticed all the uuencoded things in lib/compat before.  This
is obviously the way to fix it.

By the way, it's the compat22 distribution that needs fixing, and, as
previously noted, it's the 3.2 CD that has the last fully working ld.so.

I'll get onto committing a fix.

Stephen.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Is compatibility for old aout binaries broken?

2000-12-17 Thread Stephen McKay


On Saturday, 16th December 2000, "Donald J . Maddox" wrote:

The other day, on a whim, I decided to try running an old binary
of SimCity (the same one found in the 'commerce' directory on
many FBSD cds), and it failed in a odd way...

You and I may be the only people in the world that run old binaries.
This has been broken for new users for some time. :-(  Those of us
upgrading from source have been immune to this problem, because we
retain the old a.out ld.so binary.

/usr/libexec/ld.so: Undefined symbol "___error" called from sim:/usr/X11R6/lib
/aout/libX11.so.6.1 at 0x20160644

When errno became a function that returns a pointer (previously it was
a simple integer variable), recompiled libraries became incompatable with
old binaries.  So, I hacked the a.out loader (ld.so).  The fix was in 3.0.
Well, Nate called it a horrible hack, so maybe I should say "the hack was
in 3.0".

Am I overlooking something obvious here, or is something actually
broken with respect to running old aout binaries?

I found that rtld-aout won't compile.  That's kinda broken.
(It's probably something simple.  Looks like the a.out version of
a pic library just isn't around any more).  I'll try harder later.
What's certain is that it isn't compiled by default.

I poked about with my old FreeBSD CD collection and found that
version 3.0 through 3.2 have a fully functioning (fully hack enabled)
ld.so, but an older binary has been substituted in 3.3 and onward,
including 4.0 and 4.1, and most likely 4.2 also.

I can only guess that some anonymous release engineer (nobody we know :-)
picked the wrong CD at some point to get the master copy of ld.so once
it stopped compiling.  (Or at least stopped being easily compiled.)

Ideally, rtld-aout would be compiled fresh for every release.  Until then,
you can repair your system by retrieving ld.so from a 3.3 CD (in the
compat22 section), or from a 3.2 live filesystem CD.

Stephen.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Is compatibility for old aout binaries broken?

2000-12-17 Thread Stephen McKay


On Sunday, 17th December 2000, "Donald J . Maddox" wrote:

Under the circumstances, it seems silly to have aout conpat
bits installed at all, seeing as how they cannot work.

Old programs that don't depend on recompiled libraries are fine.  I can't
guess at the percentages though.  Also, nearly everybody has recompiled
for elf, where this problem never occurred.

Like you, I normally upgrade from source --  This box has
been -current ever since 2.0.5 or so was -current, but I
had to reinstall from scratch a while back by installing
4.2-RELEASE and then cvsupping back to -current, so I
guess I lost my working aout ld.so in the process.  Bummer :(

I expected some build tool expert to say "Just compile with these
options".  But they haven't.  So I'll see if the bits have rotted,
or whether we can keep building ld.so instead of just including
an age old binary.

Stephen.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

A tiny Perl bug?

2000-11-22 Thread Stephen McKay


I was trying to get FreeBSD 4.2-BETA to compile under FreeBSD 3.4 when
I found that the use of the new setresgid() and setresuid() system
calls were causing the perl5 compile to fail.  I got around this using
NOPERL=yup but while investigating I noticed an apparent bug in the use
of setresgid() and propose this patch:

Index: mg.c
===
RCS file: /cvs/src/contrib/perl5/mg.c,v
retrieving revision 1.1.1.4
diff -u -r1.1.1.4 mg.c
--- mg.c2000/08/20 08:42:14 1.1.1.4
+++ mg.c2000/11/22 12:01:32
@@ -1926,7 +1926,7 @@
(void)setregid((Gid_t)PL_gid, (Gid_t)-1);
 #else
 #ifdef HAS_SETRESGID
-  (void)setresgid((Gid_t)PL_gid, (Gid_t)-1, (Gid_t) 1);
+  (void)setresgid((Gid_t)PL_gid, (Gid_t)-1, (Gid_t)-1);
 #else
if (PL_gid == PL_egid)  /* special case $( = $) */
(void)PerlProc_setgid(PL_gid);

I assume this was just a typo.  I can't think of any reason to try to
set the saved uid to daemon.  I'd whip in and commit this myself, but
I'm sure there are "vendor branch considerations", and I've never
found out what's involved with that.

And piggybacking a slightly wider issue:  The cross-tools section of
Makefile.inc1 is supposed to address the use of new system calls and
such in build tools, right?  Can we forget about the old "try to use
the new syscall and do something else if it isn't there" code?  And all
we need to do to fix my migration problem is to MFC marcel's miniperl
cross-build fix?  Right?

Otherwise I have all this blather I was going to say about using fancy
new syscalls in perl just to emulate old syscalls we already have, and
the way that makes upgrading harder.  But I don't have to go on about
that, it seems. :-)

Stephen.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Ugly, slow shutdown

2000-08-11 Thread Stephen McKay


Well, I've failed in my main objective (to deuglify the shutdown messages),
but an interesting debate has resulted instead, so I can't feel too bad.

I did a little research to support my position on sleep/wakeup, and here's
the best I have.  This is pretty long, and unlikely to shake your world view,
so those of you with drooping eyelids can just head over to slashdot, or
something. :-)

Some pseudo code from "The Design of the Unix Operating System", by Maurice
Bach, page 33 shows how sleep() is used:

while (condition is true)
sleep (event: the condition becomes false);
set condition true;

and the next page shows how wakeup() is used:

set condition false;
wakeup (event: the condition is false);

In the description, it says `Thus, the "while-sleep" loop insures that at most
one process can gain access to a resource.'

Not the most convincing evidence, but on the other hand, he does not mention
the idea of *not* protecting against sudden wakeup.

From "Writing a Unix Device Driver", by Egan and Teixeira, on page 92 we find

It is not uncommon for several processes to sleep on the same
channel.  They may be competing for the same resource, or they
may be waiting for different reasons that have been associated
with the same channel value.  In this situation a single wakeup
call on the common channel will cause all the sleeping processes
to become executable; ...  A driver routine must not assume that
it can proceed after a return from a sleep call.  It should check
to see whether the event it was waiting for has actually occurred;
if it has not it should sleep again, and repeat this cycle until
the awaited event has actually occurred.

The book is oriented rather towards I/O, so perhaps not all possible uses
of routines are covered.  But again, no mention of *not* using a while loop.
Quite the opposite.

Also "Magic Garden Explained" points out that you really want to sleep on
an "event", but all you have is the address of some data.  So, you often
have multiple semantically different events represented by the same integer
wakeup channel.  A good reason to program defensively, I think.

But the best evidence is from kern_synch.c from 4.2 BSD, line 98, in the
header comment of the sleep() routine:

* Callers of this routine must be prepared for
* premature return, and check that the reason for
* sleeping has gone away.

That comment on sleep() is present from 4.0 BSD up to and including 4.3 tahoe,
but disappears in 4.3 reno, when the 4.4 style tsleep() was introduced.  After
a bit of searching through the PUPS archive, I see it is even present in
Edition 6, character for character, in a file called slp.c.

Well, I knew I wasn't a senile old fart yet, and Kirk's BSD CD compendium
and the PUPS archive show that I remember some things correctly still.  For
a considerable portion of Unix history, sleep() could return for no good reason
at all, and was documented to do so (if only in the source code).

Now, how does this relate to the current day?  Nobody in the BSD world uses
plain sleep() any more.  Once tsleep() appeared, the rules seem to have changed.
Perhaps some people had gotten away with ignoring the dire warnings in the
sleep() code, and decided that unexpected wakeups weren't such a useful part
of the API.  I hope Kirk or other BSD veterans can be coaxed into offering
an opinion.  I'd offer at least one beer for this purpose. :-)

Regardless of the history of it all, FreeBSD is full of places where
unexpected wakeups can stuff you right up.  Should we regard tsleep() like
the older sleep() call, as suspect, and program defensively?  Should we
be pragmatic, admit "We've gotten away with it so far", and document the
"no sudden wakeups" behaviour?

I quite like the general principle outlined in one of the earlier replies,
that a while loop can be shown to be correct through a local code reading,
but a simple conditional must be verified by reading all the rest of the
code.  That's close to the same argument I use against global variables.
Their use is too hard to verify as correct.  In short, I'd like to see
all cases where tsleep() is not carefully used in a loop repaired.

Practically speaking, though, I can't see that happening, especially if
we have any major players against the idea (DG for example).  Given that,
I'd like as a minimum a bit more of the history of sleep() in the tsleep()
manual page, and a discussion of when a while-loop protected tsleep() is
mandatory, and when it is optional.  Some sort of pronouncement against
issuing wakeup() calls against arbitrary addresses would help too.

I would do that right now, except I'm escaping computing for a few months.
Almost heresy nowadays, I suppose.  And I won't be the first in line for
a brain implanted net connection either. ;-)

Stephen.

PS By the time you read this, I've probably

Re: Ugly, slow shutdown

2000-08-07 Thread Stephen McKay


 * Mike Smith [EMAIL PROTECTED] [000807 01:25] wrote:
   * Stephen McKay [EMAIL PROTECTED] [000805 08:49] wrote:

... every sleeping process should expect
to be woken for no reason at all.  Basic kernel premise.
   
   You better bet it's controversial, this isn't "Basic kernel premise"
  
  Actually, that depends.  It is definitely poor programming practice to 
  not check the condition for which you slept on wakeup.
 
 Stephen's patches didn't give them that option, the syncer could be
 in some other part of vfs that doesn't expect to be woken up, perhaps
 in uniterruptable sleep... perhaps waiting for a DMA transfer?
 
 How does one check if the data filled into a buffer is actually from
 the driver and not just stale?

The time honoured standard is:

raise cpu priority
while (we do not have exclusive use of some item) {
set some sort of "I want this item" flag (optional)
sleep on a variable related to the item
}
use the item/data we waited for
lower cpu priority

A typical example from vfs_subr.c:

s = splbio();
while (vp-v_numoutput) {
vp-v_flag |= VBWAIT;
error = tsleep((caddr_t)vp-v_numoutput,
slpflag | (PRIBIO + 1), "vinvlbuf", slptimeo);
if (error) {
splx(s);
return (error);
}
}
... the code plays a little with vp here ...
splx(s);

A simpler example from swap_pager.c:

s = splbio();

while ((bp-b_flags  B_DONE) == 0) {
tsleep(bp, PVM, "swwrt", 0);
}
... code uses bp here ...
splx(s);

Both of these examples are safe from side effects due to waking up early.
This is how all such code should be.  To do otherwise is to introduce possible
race conditions.

At your prompting, though, I've looked at more code and have found an example
that violates this principle.  I assume it is a bug waiting to bite us.  In
the 4.1.0 source (sorry, that's all I have on operational computers at this
moment) line 581 of vfs_bio.c sleeps without looping.  It would seem that
Alfred's assertion of lurking danger is correct.  This stuff should be fixed.

   *boom* *crash* *ow* :)
  
  Doctor:  So don't do that.
  
  In this case, the relevant processes just need to learn to check whether 
  they've been woken in order to die.
 
 No, they need to signify that it's safe to wake them up early.

When I return to the land of FreeBSD I'll offer a speedup that does not wake
processes in arbitrary places (to avoid tickling lurking bugs).  To do this
I would make processes that want to use the suspension mechanism call a
routine in kern_kthread.c for their just-loafing-about sleep.  Then that
module will have enough information to do the job quickly.

And back to the simpler bit (the bike shed bit).  Does everyone else actually
*like* the verbose messages currently used?  And the gratuitous extra newline
in the "syncing..." message?

Stephen.

PS My main machine has blown its power supply.  Contact with me will be patchy.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Ugly, slow shutdown

2000-08-05 Thread Stephen McKay


I'm off in a few days for a couple months of tourism in Europe (no, no need
for sympathy!), so I'm dumping these couple ideas on you and running.

I think shutdown time has gotten uglier and slower than it needs to be.
I want to apply these patches (well, at least the first one) before I escape
radar range.  Your job is to not object much. :-)

Patch 1 replaces:

  Waiting (max 60 seconds) for system process `bufdaemon' to stop...stopped

with

  Stopping bufdaemon

Also:

  syncing disks... 10 10 3
  done

returns to the traditional

  syncing disks... 10 10 3 done

Patch 2 is smaller and possibly controversial.  Normally bufdaemon and
syncer are sleeping when they are told to suspend.  This delays shutdown
by a few boring seconds.  With this patch, it is zippier.  I expect people
to complain about this shortcut, but every sleeping process should expect
to be woken for no reason at all.  Basic kernel premise.

I've been running these patches on a 4.x machine for a while now.  No
problems except I am now surprised by the slow and ugly shutdown of
unpatched machines. :-)

I apologise that I've not tested these against -current.  That's the bit
that I've skipped because I'm out of time.  There should be no difference
between 4.x and -current in this area though.  These patches will apply
cleanly against both.

Cheers,

Stephen.

Patch 1:
Index: kern_shutdown.c
===
RCS file: /cvs/src/sys/kern/kern_shutdown.c,v
retrieving revision 1.76
diff -u -r1.76 kern_shutdown.c
--- kern_shutdown.c 2000/07/04 11:25:22 1.76
+++ kern_shutdown.c 2000/07/06 15:02:21
@@ -247,7 +247,6 @@
sync(proc0, NULL);
DELAY(5 * iter);
}
-   printf("\n");
/*
 * Count only busy local buffers to prevent forcing 
 * a fsck if we're just a client of a wedged NFS server
@@ -261,6 +260,8 @@
bp-b_vp-v_mount, mnt_list);
continue;
}
+   if (nbusy == 0)
+   printf("\n");
nbusy++;
 #if defined(SHOW_BUSYBUFS) || defined(DIAGNOSTIC)
printf(
@@ -593,12 +594,11 @@
return;
 
p = (struct proc *)arg;
-   printf("Waiting (max %d seconds) for system process `%s' to stop...",
-   kproc_shutdown_wait, p-p_comm);
+   printf("Stopping %s", p-p_comm);
error = suspend_kproc(p, kproc_shutdown_wait * hz);
 
if (error == EWOULDBLOCK)
-   printf("timed out\n");
+   printf(": timed out\n");
else
-   printf("stopped\n");
+   printf("\n");
 }


Patch 2:
Index: kern_kthread.c
===
RCS file: /cvs/src/sys/kern/kern_kthread.c,v
retrieving revision 1.5
diff -u -r1.5 kern_kthread.c
--- kern_kthread.c  2000/01/10 08:00:58 1.5
+++ kern_kthread.c  2000/08/05 15:32:06
@@ -116,6 +116,12 @@
 */
if ((p-p_flag  P_SYSTEM) == 0)
return (EINVAL);
+   /*
+* The target process is probably just snoozing.  Wake it up so
+* that it will notice that it should suspend itself.
+*/
+   if (p-p_wchan != NULL)
+   wakeup(p-p_wchan);
SIGADDSET(p-p_siglist, SIGSTOP);
return tsleep((caddr_t)p-p_siglist, PPAUSE, "suspkp", timo);
 }

TheEnd


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: dc driver and underruns (was: Strangeness with 4.0-S)

2000-07-16 Thread Stephen McKay


On Friday, 14th July 2000, "Rodney W. Grimes" wrote:

  I suspect an interaction between the ATA driver and VIA chipsets,
 because other than the network, that's all that is operating when I see
 the underruns.  And my Celeron with a ZX chipset is immune.

I've seen them on just about everything, chipset doesn't seem to matter,
IDE or SCSI doesn't seem to matter.

Well, maybe they are just a fact of life.  But using just my vague knowledge
of how PCI works, it doesn't look inevitable to me.  So I see bugs. :-)

 Getting even more technical, it appears to me that the current driver
 instructs the 21143 to poll for transmit packets (ie a small DMA)
 every 80us even if there are none to be sent.  I don't know what percentage
 of bus time this might be, or even how to calculate it (got some time Rod?)

I'll have to look at that.  If it is a simple 32 bit read every 80uS
thats something like .1515% of the PCI bandwidth, something that shouldn't
matter much.  (I assumed a simple 4 cycle PCI operation).  Just how big
is this DMA operation every 80uS?

I believe it is just one 32 bit read.  But I don't understand that aspect
of the hardware very well yet.  I also suspect that this polling adds
to the latency, but again, I haven't got to the end of that either.
Sometimes other things can distract you from even the most interesting
technical matter. :-)

Stephen.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

dc driver and underruns (was: Strangeness with 4.0-S)

2000-07-13 Thread Stephen McKay


On Monday, 10th July 2000, Stefan Esser wrote:
On 2000-07-09 20:52 +1000, Stephen McKay [EMAIL PROTECTED] wrote:
 On Saturday, 8th July 2000, Stefan Esser wrote:
 
Oh, there are renegotiations after each overrun ???

 The code at the point that an underrun is detected is:
 
  printf("dc%d: TX underrun -- ", sc-dc_unit);
  if (DC_IS_DAVICOM(sc) || DC_IS_INTEL(sc))
  dc_init(sc);
  
 After that, it sets the new threshold, or store and forward mode.  That
 conditional (which resets the DE-500 style cards I own), looks deliberate
 since it is so specific.  Either that, or Bill was being conservative.
 When I get a chance, I will experiment with removing it.

Well, the DE Driver (DEC 21x4x) has (relevant lines marked ***):

 [SNIP: code showing de driver does not reset chip]

I've now read the 21143 chip manual from Intel.  What the de driver does
is illegal (the transmitter must be idle when the threshold is changed).
I don't know if it works in practice, the de driver didn't work well for
me.  What the dc driver does is overkill.  I will implement some changes,
based on the documentation, and see what happens.

Of course, Bill, if you have direct experience that contradicts the
documentation (as if I've never seen incorrect doco...) then I'm all
ears.  I also have a very limited range of test hardware.

I agree, that for chips that need to be completely re-initialized, the
default might be store-and-forward ...

There are so many DEC 21x4x clones, all slightly different, and it seems
that at least a few need the chip reset.

There is already a convenient store-and-forward-only flag that is set
for one of the supported chips.  I propose that this flag be set on all
hardware that cannot have the threshold changed without a reset.

 It hides the problem very well for me.  I really can't see the tiniest
 of performance loss with store and forward.  Maybe it's something that
 only shows up on benchmarks.

Guess it will show up if you measure latencies (or your application is
doing lots of RPCs). But as soon as there is a cheap 100baseT switch in
the path to the destination, there will be store-and-forward at work ;-)

Does anyone here actually measure these latencies?  I know for a fact
that nothing I've ever done would or could be affected by extra latencies
that are as small as the ones we are discussing.  Does anybody at all
depend on the start-transmitting-before-DMA-completed feature we are
discussing?

Lastly, some people really want to keep the messages.  Is hiding them
behind bootverbose enough?  Or do I have to add a flag/hint?  No, I
haven't looked at the new hint system, so I don't know if I should
be afraid or not. :-)

Stephen.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: dc driver and underruns (was: Strangeness with 4.0-S)

2000-07-13 Thread Stephen McKay


On Thursday, 13th July 2000, "Rodney W. Grimes" wrote:

On Thu, 13 Jul 2000, Stephen McKay wrote:
 
Does anyone here actually measure these latencies?  I know for a fact
that nothing I've ever done would or could be affected by extra latencies
that are as small as the ones we are discussing.  Does anybody at all
depend on the start-transmitting-before-DMA-completed feature we are
discussing?
 
 I don't like the idea of removing that feature.  Perhaps it should be a
 sysctl or ifconfig option, but it should definitely remain available.
 Those minute latencies are critical to those of us who use MPI for
 complex parallel calculations.

I have to agree here.  The store and forward adds an approximate
11uS (by theory under ideal conditions 1500bytes@132MB/s = 11uS,
practice actually makes this worse as typical PCI does something
less than 100MB/s or 15uS) to a 120uS packet time on the wire (again,
ideal, but here given that switches, and infact often cut-through
switches, are used for these types of things, ideal and practice
are very close.)

I don't think these folks, nor myself, are wanting^H^H^H^H^H^H^Hilling
to give up 12.5%.

OK.  It seems that repairing the feature, rather than disabling it is
the most popular option.  Still, I am quite interested in finding anyone
who actually measures these things, and is affected by them.  These very
same people might be able to trace why we get the underruns in the first
place.  I suspect an interaction between the ATA driver and VIA chipsets,
because other than the network, that's all that is operating when I see
the underruns.  And my Celeron with a ZX chipset is immune.

Back to the technical, for a moment.  I have verified that stopping the
transmitter on the 21143 is both sufficient and necessary to enable the
thresholds to be set.  I have code that works on my machine.  I intend
to commit it when I think it looks neat enough.

Getting even more technical, it appears to me that the current driver
instructs the 21143 to poll for transmit packets (ie a small DMA)
every 80us even if there are none to be sent.  I don't know what percentage
of bus time this might be, or even how to calculate it (got some time Rod?)
but it looks unnecessary to me.  I think the transmitter could be turned
off regularly.  At the moment, the driver leaves it on all the time.

And to the non technical: Do the messages go or stay?  I've heard both
sides.  For most people they are just annoying fluff.  For those who
actually care about the latency, it might be informative, and thus
too useful to be hidden behind bootverbose.  Opinions?

Stephen.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Problems installing FreeBSD 4.0 20000125-CURRENT

2000-01-28 Thread Stephen McKay


On Thursday, 27th January 2000, "Rodney W. Grimes" wrote:

 On Thu, 27 Jan 2000 13:28:10 -0800, "Jordan K. Hubbard" [EMAIL PROTECTED] 
said:
 
  3. On the first reboot after installing, the keyboard was in a funny
  state.

I have seen this on numerious occasion, but have never tracked it down
to any one specific thing.  All on desktop and servers, but thats
only because we don't do laptops.

I have not seen it in quite some time (about a month), so I am thinking
it has probably been unknowingly fixed someplace.  I'll keep an eye
out for it.

I had this problem on several machines back around version 3.2.  I assumed
it was a problem between X11 and the keyboard driver.  I added a 2 second
delay before starting xdm and had no problems after that.  I've not seen
the problem without X11 being involved.  I admit I just forgot about it
after I got my workstation going. :-(

Stephen.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

That fix for the ^T crash

2000-01-27 Thread Stephen McKay


Hi, Brian!

I'm concerned that your fix won't make it before the code freeze.  Is
there a problem with it?  I admit I haven't actually tested it. :-(
My excuse is that I assumed you had.

Or should I just do a quick test on your patch (+ bde fixes) and commit
it myself?

Stephen.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Crash from ^T during heavy paging

2000-01-09 Thread Stephen McKay


I'm currently giving 4.0 a thrashing in the best way I know.  I run way too
much stuff and let it page madly all day.  Here's how I killed it:

1) pick a 32MB box
2) make -j20 buildworld
3) lean on ^T and let autorepeat go for it

Soon it dies in calcru() called from ttyinfo().  The stack trace showed
that I caught it part way through a fork().  In calcru(), p-p_stats has
a bad value because it is initialised in vm_fork() sometime *after* the
P_INMEM flag is set, and there are some M_WAITOK mallocs between them.

The problem is that calcru() thinks that P_INMEM means that the proc
structure is fully and accurately populated.  But P_INMEM is one of the
first flags set.

A few places test for p-p_stats == NULL but that doesn't look applicable
since p-p_stats is uninitialised in this case.  Hmm.  I can't see any
use for that test at first glance.

So, calcru() and possibly some other places, are looking at a struct proc
before it's all there.  What's the "proper" way to do it?

Stephen.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Small fix to netstat argument processing

2000-01-06 Thread Stephen McKay


I've got very used to an alias ns='netstat -f inet' which lets me do all
the things I like to do without annoying me with stuff I don't want to
see.  All the options that don't care about the address family just ignore
that option.  Or, used to.

Recently that changed, and "netstat -f inet -i" in particular changed to
give the -f flag priority over the -i flag.  This makes no sense to me,
so I intend to commit this patch:

--- netstat/main.c.old  Tue Jan  4 16:14:46 2000
+++ netstat/main.c  Thu Jan  6 18:19:24 2000
@@ -460,9 +460,6 @@
 */
 #endif
if (iflag) {
-   if (af != AF_UNSPEC)
-   goto protostat;
-
kread(0, 0, 0);
intpr(interval, nl[N_IFNET].n_value, NULL);
exit(0);
@@ -501,7 +498,6 @@
exit(0);
}
 
-  protostat:
kread(0, 0, 0);
if (af == AF_INET || af == AF_UNSPEC)
for (tp = protox; tp-pr_name; tp++)

It removes the special case that specifically makes "netstat -f inet -i"
act the opposite to the way it used to (and the way I expect).

Any problems, folks?  Is there some bizarre IPv6 impact I've not seen?

Hmm, I've just noticed some small misalignment of column headings in the
default output.  I'll fix that too.

Stephen.

PS Roll on 4.0-RELEASE!


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Small fix to netstat argument processing

2000-01-06 Thread Stephen McKay


On Thursday, 6th January 2000, Yoshinobu Inoue wrote:

Does these patches fix your problem, or should another better
fix is desired? Please give me any opinions.

It passes all my tests.  Please commit it.  Thank you!

And earlier you wrote:

Because now there is interface statistics display mode, when, e.g.

  netstat -s -I bar0 -f inet6

is specified. (though this is inet6 only now.)

I see where you are going now.  The syntax of netstat, already complex,
is becoming even more complex.  More detail in the man page will be
necessary soon.  Also, the "iflag" variable might have too many uses
now.  But this can wait, now that the immediate difficulties have
been resolved.

Stephen.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: HEADSUP: wd driver will be retired!

1999-12-10 Thread Stephen McKay


On Friday, 10th December 1999, "Kenneth D. Merry" wrote:

Brad Knowles wrote...
 At 3:05 PM -0700 1999/12/10, Kenneth D. Merry wrote:
 
   I agree that the CAM integration shouldn't be used as a precedent here.
   I don't agree with your characterization of it as a "debacle", though.
 
   On the whole, we gained a whole lot and lost very little.
 
  Long-term, yes I believe we gained a lot.  Short-term, what I 
 recall having heard from some of the people who lived through it, 
 well let's just say it was really ugly and nasty for a certain period 
 of time.

I don't think it was ugly and nasty at all.  You're basing your opinions
on second hand hearsay.  If you can produce specific examples of why it
was "really ugly and nasty", fine, but why not avoid making statements you
can't support?

This must depend on your perspective.  My first hand view is that it was
ugly and nasty.  This is because I lost support for hardware I was actively
using (some temporarily, some permanently), and because I had no control
over the pace of change.  For a bunch of reasons, there was no way I could
keep up (and that meant porting old drivers to keep up).  It sure felt
ugly to me.  The unnecessary renaming of device files made it worse.

But that shouldn't stop us from moving forward with the ata driver.  I
think that a small slowing of the pace, and a bit more understanding toward
those with unusual hardware will help.  And I support PHK's hard line
stance (except for the rushed pace) toward making the kernel break for
users of wd.  It has to be so, or no one will move.  The wd code will
still be in the CVS tree for desperate people to revive to use, and to
port the missing bits into the ata driver.

Stephen.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Fsck follies

1999-11-21 Thread Stephen McKay


I was giving vinum + softupdates a bit of a workout on 4 really old
SCSI disks (Sun shoeboxes, if you must know) attached to an aha1542B.
The rest of the machine is a Pentium 133 with 64MB of parity ram, a
few more disks, and another aha1542B.  It runs -current (about 10 days
old now).

I was copying a newer -current source tree onto the box when I lost power
to my house for maybe half a second.  Being foolish and shortsighted, I
have no UPS.  (An interesting side note: out of the 3 machines in use at
the time, 2 of the keyboards locked up and required a power down to recover.
I was unaware that keyboards could crash.)

When the system came back up, fsck -p didn't like the vinum volume.
No sweat, I ran it manually.  There were many

INCORRECT BLOCK COUNT I=n (4 should be 0)

messages.  I assumed this was an artifact of soft updates.  The fsck
completed successfully.

Being paranoid, I reran fsck.  This time it reported a number of
unreferenced inodes (199 to be exact), and linked them in to lost+found.

It is this last item that bothers me.  When the first fsck completed,
the filesystem should have been structurally correct.  But it wasn't.
A third fsck confirmed that 2 runs of fsck were enough.

I seem to recall sagely advice from days gone by to always run fsck twice
and sync thrice.  I thought I could forget all that stuff nowadays.

By the way, I saved the broken old source tree and compared it to the
full tree.  There were no unusual differences, except for the broken
one being incomplete.  So, if fsck were a little better, things would
be fine.  As good as you could expect, given a power failure.

Stephen.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

SCSI surprise! (was: Softupdates reliability?)

1999-08-30 Thread Stephen McKay


[I'm trying my first crosspost experiment here.  Please follow up to -scsi.]

A week ago I posted my strange crash and subsequent doubts about the proper
functioning of softupdates.  This is more of the story.

I examined the lost+found directory more closely and of the few files that
I traced, they were all temporary files or newly created directories (ports
actually) created in the CTM update process.  So, maybe I didn't really
lose anything.  Maybe fsck just doesn't recognise one of the safe-but-crashed
modes you get when using softupdates.  But unfortunately, I needed a CVS tree
urgently and restored a backup.  To make up for this, I promise to do serious
destruction testing of softupdates soon.

But, I had another crash almost as soon as I started using the machine again.
Again, the Exabyte was being used (but only rewinding at the time), but the
obvious trigger this time was intense disk activity (from "rm").  The active
file system was not using softupdates, and had a number of fsck -p correctable
errors on reboot.  Conclusions:

1) The Exabyte was not to blame for the crash
2) The crash wasn't a "scribble junk" crash (first one probably wasn't either)
3) Regular mounts are still safer than softupdates

I took the lid off anyway hoping to find anything at all weird and noticed
something I had forgotten.  I was using a Seagate ST51080N 1GB disk earlier
for some experimenting and had disconnected the POWER, but not the SCSI CABLE.
(It's a really noisy drive!) When I also unplugged the SCSI cable, all crashes
stopped.  I've now used the machine intensively for several days (copying over
20GB of small and big files, and read and written several tapes) without
incident.  Conclusions:

4) My stepping of K6-2/300 is just fine
5) My Exabyte really is ok :-)
6) It is NOT safe to have a powered down SCSI device attached to a SCSI chain
7) The world really is a wonderful place ;-)

So, apart from being happy at having stable hardware again, I am intensely
curious about this.  Why is a powered down SCSI device so nasty?  For example,
the first crash locked up my SCSI card so that reset didn't fix it, and the
second crash hung one of my disks so that it had to be powered down to even
be recognised!  Is there a standard for this stuff?

Stephen.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Softupdates reliability?

1999-08-25 Thread Stephen McKay


On Tuesday, 24th August 1999, Wilko Bulte wrote:

Hmm. I would generally expect SCSI errors etc to occur. Assuming the driver
reports those one would at least know the bus was whacko.

I saw no errors, but that's not entirely surprising since I was running X11
and by that time xconsole was probably swapped out, and the disk system
was stuck, so it wouldn't have been able to report anything.  I gave up on a
serial console a very long time ago because this machine is so reliable. :-)

Also, I recall (rumour?) that the ncr driver is not as robust in the face
of errors as the adaptec driver, at least with CAM.  Anybody know the facts?
I know, for example, that I can't get bad block lists using my scsi adapter,
but people using adaptecs can.  That shows that the ncr driver is in some
sense incomplete.  I've been meaning to look into that, but you know how
time gets away.

So, after all this, I still don't know if I have any real evidence of anything
at all.  I'll just have to keep at it until it happens again.

Stephen.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Softupdates reliability?

1999-08-24 Thread Stephen McKay


On Tuesday, 24th August 1999, Peter Jeremy wrote:

The exact order of events is not clear from this.  In general, I'd say
that if something managed to upset the SCSI bus sufficiently to
confuse every target on it, then there's a reasonably likelihood that
data transfers were also corrupted.  A serious bus corruption during a
disk write (either command or data phase) would have a reasonable
chance of resulting in corrupt data on the disk (either the wrong data
in the right place or the right data in the wrong place).

Yes, I can't tell whether the confused SCSI adapter upset the Exabyte and
maybe zero'd some disk sectors, or whether the Exabyte went bananas first
and took out everything else.  This system gets a LOT of use (I'm using it
right now), but the Exabyte obviously isn't used as often as the disks.
I might move the Exabyte on to an aha1542 as a precaution.

I'm not sure how to go about isolating the problem.  I don't suppose
you happened to bump one of the cables, or suffer a power glitch?

No power glitch or bumped cables.  All quality gear, no overclocking, good
cooling, surge suppressors, etc.  I don't like "It was just one of those
things".  That's not how computers work.  I've either got bad hardware or
there are bugs.  To counter the bugs, I'm about to go to the latest -stable.
Bad hardware will show itself eventually.  What I really should do is
build a test system with softupdates and crash it a lot.  (Using DDB
to pause, then switch off, so no partial writes.)  Could take a while...

Oh, and Brian wanted to know the processor revision.  I don't know of any
problems with K6-2/300s, but here's the info:

CPU: AMD-K6(tm) 3D processor (300.68-MHz 586-class CPU)
  Origin = "AuthenticAMD"  Id = 0x580  Stepping=0
  Features=0x8001bfFPU,VME,DE,PSE,TSC,MSR,MCE,CX8,MMX

Stephen


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

K6-2 revisions (was: Re: Softupdates reliability?)

1999-08-24 Thread Stephen McKay


On Tuesday, 24th August 1999, "Brian F. Feldman" wrote:

On Tue, 24 Aug 1999, Richard Tobin wrote:

 Origin = "AuthenticAMD"  Id = 0x580  Stepping=0
 
  You have one of the first K6-2s off the line. There were definite problems
  with these, and as such, they were specially distinguished by having 66
  printed on top.
 
 I have a 0x580 which has had no problems at all.  I'm pretty certain
 it doesn't have 66 stamped on it.  Are they all supposed to have this,
 or were they tested and the dodgy ones stamped 66?

It must be the latter. My 0x580 had the 66, so it must be that the dodgy
ones got labelled 66 and not all the 0x580s were defective.

I think the story went along the lines that AMD were making K6-2/300's for
a while, then went to a less rigorous test procedure for just a short time
until they realised that some of the processors they released wouldn't work
at 100MHz bus speeds, though they were ok at 66MHz.  So they went back to
the better testing procedure for the 100MHz models, but also released some
66MHz only models.

Mine was indeed one of the earliest, but there have been no problem with it,
and during my strange disk crash the CPU kept updating the X11 load graph 
and stuff.  The problem(s) must be elsewhere.

Stephen.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Stuck in objtrm

1999-07-06 Thread Stephen McKay


On Friday, 2nd July 1999, Stephen McKay wrote:

I have an old 486 here that I thrash to death occasionally.  Well, at least
I try to get it to page to death.  I started a make world last week and
forgot about it.

Today I noticed that it's been stuck for most of the week.  Almost everything
is fine, but one cc1 process is stuck in "objtrm".  Oh, and I hung a "cat
/proc/31624/map", too, trying to get some details (now stuck in "thrd_sleep").

So, am I just tripping over some old long-fixed bug?  Or is this a new one
worth investigating?  The kernel is from 1999/06/16 (just before the
vfs_cluster.c commit).

Well, it's happened again, but this time it is a recent -current, less than
a day old.  After a couple hours of heavy paging (yes, this is a slow box),
the make world hangs with cc1 in "objtrm".  All the other processes seem to
be waiting for it to exit.  It's the only cc1 around, by the way, even
though it was a -j5 parallel compile.

All other machine functions are fine.  ps, top, vmstat, et al show normal
looking values.  Does anybody have any hints on how to debug this?  I know
that "objtrm" implies that paging is in progress on some object, even
though there's no paging happening, and so it's probably an accounting
error with object-paging_in_progress.  But other than that, I'm not sure
where to look.

Stephen.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: Stuck in objtrm

1999-07-06 Thread Stephen McKay


On Tuesday, 6th July 1999, Andrew Gallatin wrote:

Yes.  say 'proc pidhashtbl[PID  pidhash]-lh_first' in kgdb.
I suspect that it will be in exit() also..

Magic!

It looks like a plain old exit() to me.

(kgdb) proc pidhashtbl[27157pidhash]-lh_first
(kgdb) bt
#0  mi_switch () at ../../kern/kern_synch.c:827
#1  0xc014a5bd in tsleep (ident=0xc32ea21c, priority=4, 
wmesg=0xc023db84 "objtrm", timo=0) at ../../kern/kern_synch.c:443
#2  0xc01e9741 in vm_object_terminate (object=0xc32ea21c)
at ../../vm/vm_object.h:230
#3  0xc01e96f1 in vm_object_deallocate (object=0xc32ea21c)
at ../../vm/vm_object.c:382
#4  0xc01e6acb in vm_map_entry_delete (map=0xc3047440, entry=0xc3240190)
at ../../vm/vm_map.c:1680
#5  0xc01e6c89 in vm_map_delete (map=0xc3047440, start=0, end=3217022976)
at ../../vm/vm_map.c:1783
#6  0xc01e6d1d in vm_map_remove (map=0xc3047440, start=0, end=3217022976)
at ../../vm/vm_map.c:1808
#7  0xc0141d20 in exit1 (p=0xc322f0a0, rv=0) at ../../kern/kern_exit.c:220
#8  0xc0141b24 in exit1 (p=0xc322f0a0, rv=-1021614488)
at ../../kern/kern_exit.c:106
#9  0xc020e41a in syscall (frame={tf_fs = 47, tf_es = 137297967, 
  tf_ds = -1078001617, tf_edi = 136021320, tf_esi = 0, 
  tf_ebp = -1077947348, tf_isp = -1020915756, tf_ebx = -1, 
  tf_edx = 135690384, tf_ecx = 136200192, tf_eax = 1, tf_trapno = 12, 
  tf_err = 2, tf_eip = 135656524, tf_cs = 31, tf_eflags = 582, 
  tf_esp = -1077947368, tf_ss = 47}) at ../../i386/i386/trap.c:1056
#10 0xc0202cc0 in Xint0x80_syscall ()
error reading /proc/27157/mem


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Stuck in objtrm

1999-07-02 Thread Stephen McKay


I have an old 486 here that I thrash to death occasionally.  Well, at least
I try to get it to page to death.  I started a make world last week and
forgot about it.

Today I noticed that it's been stuck for most of the week.  Almost everything
is fine, but one cc1 process is stuck in "objtrm".  Oh, and I hung a "cat
/proc/31624/map", too, trying to get some details (now stuck in "thrd_sleep").

So, am I just tripping over some old long-fixed bug?  Or is this a new one
worth investigating?  The kernel is from 1999/06/16 (just before the
vfs_cluster.c commit).

Stephen.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: ctm-mail cvs-cur.5292.gz 18/82

1999-05-04 Thread Stephen McKay

On Sunday, 2nd May 1999, Chuck Robey wrote:

On Mon, 3 May 1999, Jean-Marc Zucconi wrote:

 This one did not arrive in my mailbox. Can someone send it to me? I
 would like to avoid downloading 6Mbytes again.

I'm going to mail it to you separately, but it might not look like it
came from me.

I also did not receive part 18.  Are the individual parts kept anywhere
for anonymous ftp access?

Failures are rare, but they hit the big updates disproportionately and
have a bigger effect on bigger updates, so it's a double lose.

Stephen.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message

Re: have live system with NFS client cache problems what do i do?

1999-04-11 Thread Stephen McKay

On Sunday, 11th April 1999, Alfred Perlstein wrote:

On Sun, 11 Apr 1999, Matthew Dillon wrote:

 doing a 'file cd9660_bmap.o' on laptop (NFS client) gives me a 
 cd9660_bmap.o: MS Windows COFF Unknown CPU
 
 An MS Windows binary?  Do you have any msdos mounts on
 the client or server?  How is /usr/obj mounted?

no i have no msdos mounted filesystems, i do however have an
unmounted win98 partition and a cdrom with joliet extentions mounted
however the cdrom only contains mp3s.

This is a red herring:

$ dd if=/dev/zero of=foo count=1
1+0 records in
1+0 records out
512 bytes transferred in 0.000114 secs (4487949 bytes/sec)
$ file foo
foo: MS Windows COFF Unknown CPU
$

Look for the usual pack-of-nulls corruption instead.

Stephen.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message

Re: have live system with NFS client cache problems what do i do?

1999-04-11 Thread Stephen McKay

On Sunday, 11th April 1999, Brian Feldman wrote:

This has nothing to do with DOS. In case you didn't get my other hint:
{/home/green}$ dd if=/dev/zero count=1 2/dev/null | file -
standard input:  MS Windows COFF Unknown CPU

Don't ya just hate it when your mail is slow!  Sigh...

Stephen.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message

Slightly wonky auto memory probe + fix

1999-04-06 Thread Stephen McKay

[I posted this to -current because the technology is the same in -current
even though this box will never run -current.  Bear with me.]

We've just got a new Dell PowerEdge (very nice) with 512MB of ram.  By
default, 3.1-stable sees only 64MB.  Looking carefully, it sees 8KB less
than 64MB, so it doesn't probe for the rest.

I applied this patch, which fiddles the Hmm got 64MB so probe for the
rest heuristic.  With this patch, it found all 512MB, to the exact byte.
Unfortunately, it kinda changes it from a heuristic to a hack. :-(


--- machdep.c   Fri Feb 19 15:31:36 1999
+++ /tmp/sgm/machdep.c  Tue Apr  6 23:40:36 1999
@@ -1428,7 +1428,7 @@
 * the MAXMEM option or the npx0 msize, then don't do the speculative
 * memory probe.
 */
-   if (Maxmem = 0x4000)
+   if (Maxmem = 0x3f00)
speculative_mprobe = TRUE;
else
speculative_mprobe = FALSE;
@@ -1538,7 +1538,7 @@
if (phys_avail[pa_indx] == target_page) {
phys_avail[pa_indx] += PAGE_SIZE;
if (speculative_mprobe == TRUE 
-   phys_avail[pa_indx] = (64*1024*1024))
+   phys_avail[pa_indx] = (63*1024*1024))
Maxmem++;
} else {
pa_indx++;

Stephen.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message

Re: EGCS breaks what(1)

1999-04-05 Thread Stephen McKay

On Monday, 5th April 1999, Matthew Dillon wrote:

:char sccs[] = { '@', '(', '#', ')' };
:char version[] = blahhhfoo;
:Was contiguous.

'what' is broken.  C does not impose any sort of address ordering
restriction on globals or autos that are declared next to each other.   

Well, it's really an abuse of 'what', and not anything wrong with 'what'
ifself.  It will continue to work fine doing the job it was designed to do.

The NetBSD folks faced this problem some time ago, and I believe their
solution was to duplicate the version information.  So, version[] is the
same as it used to be, and sccs[] is 4 bytes longer than version[] to hold
a complete copy, and the @(#) prefix.  This is then completely portable.

Alternately, we could jimmy around with the current hack, and prefix it
with 4 NULs, and see what happened.  Sorry, I haven't tested this idea, as
I've not yet made the EGCS jump.

Stephen.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message

Re: Possible fix for rc.conf

1999-03-21 Thread Stephen McKay

On Sunday, 21st March 1999, Richard Wackerbarth wrote:

Why do we need to have ANY of the file inclusion in /etc/defaults/rc.conf?
Shouldn't that file simply be definitions of variables?
IMHO, the logic should be in rc itself.

Yeah!  What he said!

Having code in rc.conf sucks.  If there is no logic, there can be no
recursion.  If you are going to mix code into rc.conf you may as well
just suck it back into /etc/rc and get rid of it entirely. (*)

Stephen.

(*) Which is silly, of course.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message

panic: vm_object_qcollapse(): object mismatch

1999-02-04 Thread Stephen McKay

Hardware: 486DX2/66 16Mb ram, aha1542CF, 2x1Gb SCSI disks
Software: 4.0-current 1-2 days old, softupdates
  (vm_map.c is at rev 1.146, for example)

I was running 'make -j5 buildworld'.  It swaps like crazy when I do this. :-)

Here's what gdb -k tells me:

...
#9  0xf01425e0 in panic (
fmt=0xf0225c1f vm_object_qcollapse(): object mismatch)
at ../../kern/kern_shutdown.c:446
#10 0xf01e0772 in vm_object_qcollapse (object=0xf2f001d0)
at ../../vm/vm_object.c:1011
#11 0xf01e08d6 in vm_object_collapse (object=0xf2f001d0)
at ../../vm/vm_object.c:1102
#12 0xf01ddae2 in vm_map_copy_entry (src_map=0xf2f4aa00, dst_map=0xf2f4ad00, 
src_entry=0xf2ed0e10, dst_entry=0xf2f8edc0) at ../../vm/vm_map.c:2284
#13 0xf01ddd73 in vmspace_fork (vm1=0xf2f4aa00) at ../../vm/vm_map.c:2411
#14 0xf01da833 in vm_fork (p1=0xf2f7db20, p2=0xf2d751e0, flags=20)
at ../../vm/vm_glue.c:231
#15 0xf013d4f0 in fork1 (p1=0xf2f7db20, flags=20) at ../../kern/kern_fork.c:447
#16 0xf013ce65 in fork (p=0xf2f7db20, uap=0xf3021f94)
at ../../kern/kern_fork.c:99
#17 0xf01fe783 in syscall (frame={tf_es = 134807599, tf_ds = -272695249, 
  tf_edi = 134750909, tf_esi = 134935201, tf_ebp = -272643652, 
  tf_isp = -217964572, tf_ebx = 4, tf_edx = 672250004, tf_ecx = 19, 
  tf_eax = 2, tf_trapno = 12, tf_err = 2, tf_eip = 671826564, tf_cs = 31, 
  tf_eflags = 662, tf_esp = -272651296, tf_ss = 47})
at ../../i386/i386/trap.c:1100
#18 0xf01f4e9c in Xint0x80_syscall ()
...
(kgdb) p *p
$1 = {pageq = {tqe_next = 0xf02c5240, tqe_prev = 0xf02e4e00}, hnext = 0x0, 
  listq = {tqe_next = 0xf02e59d0, tqe_prev = 0xf2f69cc8}, object = 0xf2f69cb0, 
  pindex = 30, phys_addr = 15065088, queue = 4, flags = 1, pc = 0, 
  wire_count = 0, hold_count = 0, act_count = 27 '\e', busy = 0 '\000', 
  valid = 255 'ÿ', dirty = 255 'ÿ'}
(kgdb) p object
$2 = (struct vm_object *) 0xf2f001d0
(kgdb) p *object
$3 = {object_list = {tqe_next = 0xf2fdc2b8, tqe_prev = 0xf2f69c3c}, 
  shadow_head = {tqh_first = 0x0, tqh_last = 0xf2f001d8}, shadow_list = {
tqe_next = 0x0, tqe_prev = 0xf2f69cb8}, memq = {tqh_first = 0xf02dbcb0, 
tqh_last = 0xf02cc86c}, generation = 11690, type = OBJT_DEFAULT, 
  size = 32, ref_count = 2, shadow_count = 0, pg_color = 0, 
  hash_rand = -136756254, flags = 8576, paging_in_progress = 0, behavior = 0, 
  resident_page_count = 6, cache_count = 0, wire_count = 0, 
  backing_object = 0xf2f69cb0, backing_object_offset = 0x, 
  last_read = 0, pager_object_list = {tqe_next = 0xf2f69000, 
tqe_prev = 0xf0252f10}, handle = 0x0, un_pager = {vnp = {
  vnp_size = 0x}, devp = {devp_pglist = {tqh_first = 0x0, 
tqh_last = 0x0}}, swp = {swp_bcount = 0}}}
(kgdb) p *(p-object)
$4 = {object_list = {tqe_next = 0xf2f915e4, tqe_prev = 0xf30fd0e8}, 
  shadow_head = {tqh_first = 0xf2f001d0, tqh_last = 0xf2f001e0}, 
  shadow_list = {tqe_next = 0x0, tqe_prev = 0xf30fef04}, memq = {
tqh_first = 0xf02e7170, tqh_last = 0xf02cff5c}, generation = 10219, 
  type = OBJT_SWAP, size = 32, ref_count = 3, shadow_count = 1, pg_color = 0, 
  hash_rand = -136000830, flags = 384, paging_in_progress = 0, behavior = 0, 
  resident_page_count = 4, cache_count = 1, wire_count = 0, 
  backing_object = 0x0, backing_object_offset = 0x, 
  last_read = 29, pager_object_list = {tqe_next = 0xf30fad24, 
tqe_prev = 0xf30f0814}, handle = 0x0, un_pager = {vnp = {
  vnp_size = 0x0001}, devp = {devp_pglist = {tqh_first = 0x1, 
tqh_last = 0x0}}, swp = {swp_bcount = 1}}}


I'll keep this dump around.  What other details do people want?

I'm not likely to even get to look at this let alone solve it.  Bummer.

Stephen.

To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-current in the body of the message

57 matches

Mail list logo