Recovery? recent make world rendered system unusable (64 bit change)

2003-11-18 Thread Chris Shenton
I've been running 5.1-CURRENT for a while and a couple nights ago did
a make world.  After a couple hours building, my system was
unusable.  Critical binaries like rm, ls, mtree, sh failed,
reporting Exec format error.  I can't login, not even single user.
I can no longer even boot single user.

I've hosed my system and am looking for a way to recover without
having to reinstall everything and overwrite critical data and system
config files.  Naturally, I only discovered the note in UPDATING after
I trashed my system -- in fact, I read it from the OK boot prompt
with its more.  Doh!

  20031112: The statfs structure has been updated with 64-bit fields
  to allow accurate reporting of multi-terabyte filesystem sizes. You
  should build world, then build and boot the new kernel BEFORE doing
  a `installworld' as the new kernel will know about binaries using
  the old statfs structure, but an old kernel will not know about the
  new system calls that support the new statfs structure. [...]
  Running an old kernel after a `make world' will cause programs such
  as `df' that do a statfs system call to fail with a bad system
  call. [...]  DO NOT make installworld after the buildworld w/o
  building and installing a new kernel FIRST.  You will be unable to
  build a new kernel otherwise on a system with new binaries and an
  old kernel.

I'm looking for recommendations on how to recover, hopefully without
trashing my critical system files like /etc/passwd.  Ideally, I guess
I'd like a way to replace all the broken binaries and any related
libraries without overwriting other files.   

If I do a floppy-based install and then select Custom/Expert than
request a minimal install, I presume it will install a small set of
binaries but also overwrite /etc/passwd, /etc/ssh/* and so on.  Is
there a way to have it just update binaries and libraries?

If I have to, I could add another disk to this box.  Then I could do a
floppy install of 5.x on to that new disk.  Then I could boot it, and
mount the old disk's partitions. Then install the new install's
binaries on the old partitions.  Or perhaps I could do a make
buildworld, kernel, installworld the proper way, using the old disk's
partitions as the target.

Or could I -- somehow -- push a 64-bit-aware kernel onto this box so
that the newly broken binaries will work again? How?  Again, I've
got no shell access any more so everything's gonna have to be done
from floppy or maybe CD if I can borrow a burner. Naturally, this is
my net boot server for my diskless clients so I can't go that route
either. :-(

Any other suggestions?  Thanks.



___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Recovery? recent make world rendered system unusable (64 bit change)

2003-11-18 Thread Chris Shenton
masta [EMAIL PROTECTED] writes:

 The easy way is to grab a recent livecd from the jp snapshot service.
 [ http://livecd.sourceforge.net/ ]

 With the jpsnap livecd I was able to boot, copy all the working
 binaries from the cdrom over the corrupt binaries on the local HDD. I
 suggest you try the same idea.

That seems a like a nice suite, but the site says it's acts like a 4.6
repair, so I don't think the binaries would be suitable for replacing
my damaged 5.1 commands.  :-(


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Recovery? recent make world rendered system unusable (64 bit change)

2003-11-18 Thread Chris Shenton
Barney Wolff [EMAIL PROTECTED] writes:

 Re-install/upgrade from a cd.  Upgrade should leave your files alone.

Thanks, Barney -- that's what I did and it saved my butt.

A few folks suggested either LiveCD images or fixit functionality.
I was kinda dead in the water and didn't think I could download a
LiveCD and burn it from another system.  I played with the floppy
fixit functionality a bit but didn't see a way to preserve /etc and
such.  

So I used a 5.1-RELEASE CD I had and used the UPGRADE option which
promised to save my /etc stuff.  I specified my old mount points
(fortunately, I was able to read /etc/fstab from the boot OK prompt
and make paper notes!).  I then tried -- twice -- to install the
minimal system from the CD and both times it kernel panic'd with a
page fault (in process bufdaemon, last time).

For grins, I again specified my mounts (only /, /var, /tmp, /usr; I
didn't bother with /home and /usr/local), and told it to install via
FTP. Surprisingly, this worked -- no panic.

It appears to have installed a working kernel, /bin, /usr/bin, and
friends and now I'm running again.  I'm now doing a make build world
and then will do a make kernel KERNCONF=MyKernelDefinitionFileName,
then finally a make installworld per the UPGRADING guide.

I've never used the Upgrade option to FreeBSD and I've been using it
heavily since 2.2.x.  It's a good thing.

Many thanks to everyone who replied.

I promise I'll scan UPGRADING before doing a make *world next time!


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


How to create distribution for later NFS sysinstall on other box?

2003-10-14 Thread Chris Shenton
Some of my systems are 5.1-CURRENT but I still have some older 4.x
boxes.  I'd like to upgrade them to the same OS as my 5.1 boxes.

It seems stupid to feed them boot floppies then FTP the OS across the
WAN from freebsd.org or mirrors.  

I expect there's a way to build a distribution on my main 5.1 system
then use sysinstall on the target 4.x to install via NFS (or FTP
or...) over the LAN.  I have not found any pointers on doing this in
the Handbook or a couple quick Googles (perhaps I'm searching on the
wrong terms).  Seems it should be something like this on the server:

  cd /usr/src
  make distribution

I'd like to make the distribution based on my 5.1-CURRENT, rather than
copying/creating a 5.1-RELEASE image so I won't have to do a
subsequent update to get it CURRENT.

Any pointers? If I'm missing obvious docs, just tell me where to RTFM. :-)

Thanks.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


make buildworld: Signal 11; Illegal instruction

2003-07-31 Thread Chris Shenton
I'm trying to do a make buildworld on my system:

 FreeBSD PECTOPAH.shenton.org 5.1-CURRENT FreeBSD 5.1-CURRENT #2: Tue
 Jul 1 19:48:37 EDT 2003
 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/PECTOPAH i386

And it keeps dying at various points early in the build. It's a
different location each time, some times as soon as 12 seconds, some
as long as 100 seconds.   Most of the time it's a Signal 11, e.g.:

  rm -f .depend GPATH GRTAGS GSYMS GTAGS
  === games/pom
  *** Signal 11

But sometimes it complains about Illegal instruction:

  === rescue/rescue/client
  rm -f dhclient clparse.o dhclient.o dhclient.conf.5.gz dhclient.leases.5.gz 
dhclient.8.gz dhclient-sc\
  ript.8.gz dhclient.conf.5.cat.gz dhclient.leases.5.cat.gz dhclient.8.cat.gz 
dhclient-script.8.cat.gz
  Illegal instruction (core dumped)
  *** Error code 132

This smells like a hardware problem to me.  Oddly, this is the first
off-the-shelf box I've bought in years.  A Dell 600sc with CERC RAID
controller, 256MB DELL RAM.  To this, I added 512MB Crucial RAM.

I've seen this before in heavy builds (mozilla, openoffice, x11) but
now it's really buggin' me.  I'm kinda stuck if I can't make world.

Suggestions? If you think it's marginal HW, do you have any
suggestions on how to test and determine the culprit?

Thanks.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: make buildworld: Signal 11; Illegal instruction

2003-07-31 Thread Chris Shenton
Chris Shenton [EMAIL PROTECTED] writes:

   *** Signal 11
... 
   Illegal instruction (core dumped)
   *** Error code 132

Also seeing

*** Signal 4

if it matters.  This sounds way too flakey to be SW.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.1-CURRENT hangs on disk i/o? sysctl_old_user() non-sleepablelocks

2003-06-23 Thread Chris Shenton
Don Lewis [EMAIL PROTECTED] writes:

 Try the very untested patch below ...

Well, it seems to be working now, but not necessarily due to this
patch.  I lost two of the four drives on my ATA RAID card (RAID-5) so
lost my entire system :-(.

Rebuilt the box from the 5.0-RELEASE floppies/net then cvsupped to
5.1-CURRENT.  Reinstalled all the stuff like qmail and apache.  I'm
no longer seeing the unlocked messages in the logs any longer.

Thanks for all your help!


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.1-CURRENT hangs on disk i/o? sysctl_old_user() non-sleepablelocks

2003-06-18 Thread Chris Shenton
Don Lewis [EMAIL PROTECTED] writes:

 Try the very untested patch below ...

 RCS file: /home/ncvs/src/sys/kern/uipc_syscalls.c,v

When I do the patch, how much of the OS do I need to rebuild, just do
a make install in the .../src/sys/kern dir?  Rebuild the OS from
the top dir? Rebuild the kernel?  I want to make sure I'm giving this
a proper test.

Thanks.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.1-CURRENT hangs on disk i/o? sysctl_old_user() non-sleepablelocks

2003-06-18 Thread Chris Shenton
Don Lewis [EMAIL PROTECTED] writes:

 Try the very untested patch below ...

 RCS file: /home/ncvs/src/sys/kern/uipc_syscalls.c,v
 retrieving revision 1.150
 Try the very untested patch below ...
 diff -u -r1.150 uipc_syscalls.c
 --- uipc_syscalls.c   12 Jun 2003 05:52:09 -  1.150
 +++ uipc_syscalls.c   18 Jun 2003 03:14:42 -
 @@ -1775,10 +1775,13 @@
*/
   if ((error = fgetvp_read(td, uap-fd, vp)) != 0)
   goto done;
 + vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, td);
   if (vp-v_type != VREG || VOP_GETVOBJECT(vp, obj) != 0) {
   error = EINVAL;
 + VOP_UNLOCK(vp, 0, td);
   goto done;
   }
 + VOP_UNLOCK(vp, 0, td);

Tried it, rebuilt kernel, rebooted, no affect :-(

You were correct about apache using it.  Doing a simple

  fetch http://pectopah/

causes the error, dropping me into ddb if panic enabled. A tr shows
the same trace as I submitted yesterday :-(

Time to find that null modem cable.

Thanks.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade

2003-06-17 Thread Chris Shenton
Don Lewis [EMAIL PROTECTED] writes:

 Thanks for doing the testing.  I just committed this patch.

Seems fine here too -- many thanks.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.1-CURRENT hangs on disk i/o? sysctl_old_user() non-sleepablelocks

2003-06-17 Thread Chris Shenton
Don Lewis [EMAIL PROTECTED] writes:

 I doubt it.  I checked in a fix for this problem today so you should get
 the fix when you next cvsup.

Yup, many thanks.

 Can you break into ddb and do a ps to find out what state all the
 processes are in?

I'm a newbie to ddb.  Was able to get a ps from a hung system but
didn't know how to capture it to send to you.  Any hints?


 You might want to try adding the DEBUG_VFS_LOCKS options to your
 kernel config to see if that turns up anything.

Oh, man, I'm getting killed here now. Rebuilt the kernel with that
option (not found in GENERIC or other examples in /usr/src/sys/i386/conf/).

Now the system is dropping into ddb ever minute or so with complaints
like the following on the screen, and in /var/log/messages:

Jun 17 21:06:08 PECTOPAH kernel: VOP_GETVOBJECT: 0xc584eb68 is not locked but should be
Jun 17 21:08:04 PECTOPAH last message repeated 3 times
...
Jun 17 21:18:55 PECTOPAH kernel: VOP_GETVOBJECT: 0xc59346d8 is not locked but should be
Jun 17 21:18:59 PECTOPAH last message repeated 5 times

Lots 'n' lots of 'em, with a few of the same hex value then another
set for a different hex value.

 There is also ddb command to list the locked vnodes show
 lockedvnods.

After I type cont at ddb a few times the system runs for a while
again, only to repeat.  When it drops to ddb again that show command
doesn't list anything. 

I may have to remove that option from my kernel just to get to run a
bit, even tho eventually the system will hang.  It's (of course) my
main box which the other systems NFS off, mail server, etc. :-(


 Are you using nullfs or unionfs which are a bit fragile?

Nope.  I'd be happy to mail you my kernel config if you want. I've
posted it to http://chris.shenton.org/PECTOPAH but if the system's
hung again, naturally it won't be available :-(


Thanks for your help.  Any other things I might try?

Dunno if this matters, but I'm using an DELL CERC ATA RAID card with
disks showing up as amrd* if that matters.  Was flawless at
5.0-{CURRENT,RELEASE}.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.1-CURRENT hangs on disk i/o? sysctl_old_user() non-sleepablelocks

2003-06-17 Thread Chris Shenton
Oh, FWIW, I did a cvsup and rebuilt the OS and kernel then did a
mergemaster about 30 minutes ago in order to get your fix to my qmail
issue.  So I'm running about as CURRENT as possible.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.1-CURRENT hangs on disk i/o? sysctl_old_user() non-sleepablelocks

2003-06-17 Thread Chris Shenton
Don Lewis [EMAIL PROTECTED] writes:

 If you have another machine and a null modem cable you can redirect the
 system console of the machine to be debugged to a serial port and run
 some comm software on the other machine so that you can capture all the
 output from ddb.

OK, I'll give that a shot, probably tomorrow.


 At the ddb prompt, you can do a tr command to get a stack trace,
 which is likely to be very helpful in pointing out the offending
 code.

Just saw it again, did a tr.  From chicken-scratch notes, the last
bits are:

  VOP_GETVOBJECT(...)
  do_sendfile(...)
  sendfile(...)
  syscall(...)
  Xint0x80_syscall...
  --- syscall( 393, FreeBSD ELF32, sendfile) ...

The next time it dropped into ddb, same sendfile thing.

The main services I'm running are qmail, apache, and NFS.  Also 
tftp, rarpd, lpd, sshd, bootparamd ...  oh, well, I guess I'm running
a bunch of stuff here. :-(  Not sure which one, if any, this would be.

Unless sendfile() is something in the OS?


I'll have to dig up a nullmodem and grab console output.  I realise
I'm not giving enough detailed info to be very helpful here.


 If you are running the NFS *client* code on this machine, there is one
 lock assertion that is easy to trigger. 

In my kernel config I have this, because a diskless box uses the same
kernel, but my /etc/fstab doesn't mount anyone else's NFS exports.

options NFSCLIENT   #Network Filesystem Client

[EMAIL PROTECTED]101 ps -axww|grep nfs
   42  ??  IL 0:00.00  (nfsiod 0)
   43  ??  IL 0:00.00  (nfsiod 1)
   44  ??  IL 0:00.00  (nfsiod 2)
   45  ??  IL 0:00.00  (nfsiod 3)
  428  ??  Is 0:00.03 nfsd: master (nfsd)
  429  ??  I  0:00.09 nfsd: server (nfsd)
  430  ??  I  0:00.00 nfsd: server (nfsd)
  431  ??  I  0:00.00 nfsd: server (nfsd)
  432  ??  I  0:00.00 nfsd: server (nfsd)
35366  p0  R+ 0:00.00 grep nfs

 At the ddb prompt you should be able to use the write command tweak a
 couple of variables to modify this behavior.  If you set the
 vfs_badlock_panic variable to zero, the kernel will no longer drop into
 DDB when one of these lock violations occurs.  If you set the
 vfs_badlock_print variable to zero, the kernel will stop printing the
 warnings.

OK, I've done a

  examine vfs_badlock_panic

which shows it zero, then

  write vfs_badlock_panic 0

at least for now.

Thanks again.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


5.1-CURRENT hangs on disk i/o? sysctl_old_user() non-sleepable locks

2003-06-16 Thread Chris Shenton
(I don't know if this has any relation to the problems I reported
yesterday with qmail-send consuming 100% cpu after 5.0 to 5.1 upgrade.)

After booting 5.1-CURRENT the system runs fine for a while.  Then
later most disk i/o related actions seem to hang.  E.g., system works
but when cron kicks off a glimpseindex in the middle of the night, the
system is useless by the morning.  If I login on the console as me, it
takes my username and password then hangs (trying to run
/usr/local/bin/bash?). If I do this as root, I do get a shell
(/bin/csh).  After a point, asking for top will hang, even as root.
Even a reboot hung this morning with nothing in the logs.

The system has become almost unusable because of this, requiring
frequent reboots or hardware resets.

Sometimes when I do something as simple as ps I see this ominous
message on the console:

  sysctl_old_user() with the following non-sleepablelocks held:
  exclusive sleep mutex process lock r = 0 (0xc50bc9e0) locked @ 
/usr/src/sys/kern/kern_proc.c:258

which gets into /var/log/messages as:

  Jun 16 08:33:48 PECTOPAH kernel: exclusive sleep mutex process lock r = 0 
(0xc50c7618) locked @ /usr/src/sys/kern/kern_proc.c:258

There are a bunch of these.

That file is version:

  $FreeBSD: src/sys/kern/kern_proc.c,v 1.189 2003/06/14 06:20:25 alc Exp $

and the line is the PROC_LOCK() portion of:

  struct proc *
  pfind(pid)
  register pid_t pid;
  {
  register struct proc *p;

  sx_slock(allproc_lock);
  LIST_FOREACH(p, PIDHASH(pid), p_hash)
  if (p-p_pid == pid) {
  PROC_LOCK(p);
  break;
  }
  sx_sunlock(allproc_lock);
  return (p);
  }

Any thoughts? Thanks.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade

2003-06-15 Thread Chris Shenton
I've been running qmail for years and like it, installed pretty much
per www.LifeWithQmail.org.  My main system was running FreeBSD
5.0-RELEASE and -CURRENT and qmail was fine.  When I just upgraded to
5.1-CURRENT a couple days back, the qmail-send process started using
all CPU.

  last pid: 22793;  load averages:  1.06,  1.02,  1.00  up 0+08:13:46  20:36:32
  74 processes:  2 running, 72 sleeping

  Mem: 38M Active, 51M Inact, 84M Wired, 28K Cache, 73M Buf, 452M Free
  Swap: 2048M Total, 2048M Free

PID USERNAME PRI NICE   SIZERES STATETIME   WCPUCPU COMMAND
615 qmails   1320  1228K   616K RUN483:00 96.88% 96.88% qmail-send

I noticed an identical complaint on the qmail list, to which there have so
far been no replies (except you should ask the FreeBSD list):

From: Luca Morettoni [EMAIL PROTECTED]
Subject: qmail on FreeBSD 5.1-CURRENT
To: [EMAIL PROTECTED]

[...] qmail is run under daemontools and all work fine (the configuration
is 2 years old!), but when I delivery the first mail (localy or remote)
the qmail-send process fire up to 100% of CPU infinitely

All other mail are right delivery, and the CPU use is the only problem, I
see in qmail-send.c that select() function, after the first message,
allways return 1

A truss shows me it's running in a tight loop over this code:

open(lock/trigger,0x4,027757775230)= 8 (0x8)
stat(todo,0xbfbffa00)  = 0 (0x0)
open(todo,0x4,01)  = 9 (0x9)
fstat(9,0xbfbffa00)  = 0 (0x0)
fcntl(0x9,0x2,0x1)   = 0 (0x0)
fstatfs(0x9,0xbfbff900)  = 0 (0x0)
getdirentries(0x9,0x8059000,0x1000,0x805a214)= 512 (0x200)
gettimeofday(0xbfbffbc8,0x0) = 0 (0x0)
select(0x9,0xbfbffcbc,0xbfbffc3c,0x0,0xbfbffc24) = 1 (0x1)
gettimeofday(0xbfbffbc8,0x0) = 0 (0x0)
gettimeofday(0xbfbffbc8,0x0) = 0 (0x0)
select(0x9,0xbfbffcbc,0xbfbffc3c,0x0,0xbfbffc24) = 1 (0x1)
gettimeofday(0xbfbffbc8,0x0) = 0 (0x0)
getdirentries(0x9,0x8059000,0x1000,0x805a214)= 0 (0x0)
lseek(9,0x0,0)   = 0 (0x0)
close(9) = 0 (0x0)
gettimeofday(0xbfbffbc8,0x0) = 0 (0x0)
select(0x9,0xbfbffcbc,0xbfbffc3c,0x0,0xbfbffc24) = 1 (0x1)
gettimeofday(0xbfbffbc8,0x0) = 0 (0x0)
close(8) = 0 (0x0)
open(lock/trigger,0x4,027757775230)= 8 (0x8)

I see nothing besides usual message delivery information in qmail's logs.

Failing that, I rebuilt qmail and it seemed to have fixed it, but I didn't
wait long enough: it's pegged at 100% CPU, constantly.  If what Luca says is
true, maybe it hadn't sent a message yet.

Anyone else seen this or know what in FreeBSD-5.1 might have changed to cause
this?  Any thoughts on how I might go about diagnosing this any better?

Thanks.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


DELL CERC amr RAID card beeping, dead drive? how to diagnose/fix?

2003-05-29 Thread Chris Shenton
I have a DELL 600SC which came with a DELL CERC RAID controller.  It's
recognized by FreeBSD-CURRENT as an amr device even though it's got
four ATA disk channels on it instead of the documented SCSI drives for
the PERC controller.  I have 4x WD1200JB ATA 120GB disks on it which
have been running fine for a few months as a set of RAID-5 volumes.
From dmesg:

  amrd0: LSILogic MegaRAID logical drive on amr0
  amrd0: MB (20477952 sectors) RAID 5 (optimal)
  amrd1: LSILogic MegaRAID logical drive on amr0
  amrd1: 111093MB (227518464 sectors) RAID 5 (optimal)
  amrd2: LSILogic MegaRAID logical drive on amr0
  amrd2: 111093MB (227518464 sectors) RAID 5 (optimal)
  amrd3: LSILogic MegaRAID logical drive on amr0
  amrd3: 111099MB (227530752 sectors) RAID 5 (optimal)

An hour ago, it started beeping at me.  I suspect this is the CERC
card warning me that one of the disk drives has failed and that I'd
better do something about it. :-(

Is there a way to diagnose it from a live system, to query which of
the four ATA drives it thinks is dead, so I can replace it?

(Seems to me that a WD1200JB drive should last a lot longer than a few
months it's been running, in a properly ventilated DELL box; any ideas?)

Anyone have experience with this CERC controller and replacing a
drive?  My biggest fear is that I haven't tested the RAID rebuild and
that even when I do replace the failed (?) drive it won't do the
automatic rebuild and save my data.

Other suggestions?

Thanks.



___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: A few 5.0-Release questions...

2003-03-05 Thread Chris Shenton
John Wilson [EMAIL PROTECTED] writes:

 --- Scott Long [EMAIL PROTECTED] wrote:
 [Dell PowerEdge]
  What model?  There are quite a few PowerEdges out
 
 It's a 600SC - P4 1.8 - Perc3/SC

FWIW, I had absolutely no trouble booting and installing 5.0-R on my
600SC, with the DELL-supplied CERC RAID card (amr device recognized
it, but it drives 4x ATA disks rather than SCSI), and an Intel gigabit
ether card.  Got X11 working on it rather easily too.  
I don't have any other drives (than the supplied IDE CD) in the box.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message


Re: Diskless: 5.0R scripts, boot, NFS mount problems I didn't have in 4.7S

2003-02-22 Thread Chris Shenton
Matthew Dillon [EMAIL PROTECTED] writes:

 # ls -lR /conf
 drwxr-xr-x  5 root  wheel  512 Dec 21 10:37 base
 drwxr-xr-x  3 root  wheel  512 Dec 19 21:56 default
...
 /conf/base/etc:
 -rw-r--r--  1 root  wheel  18 Dec 19 22:10 diskless_remount
 -rw-r--r--  1 root  wheel   6 Dec 19 22:22 md_size
...
 /conf/default/etc:
 -rw-r--r--   1 root  wheel   184 Feb 18 18:16 fstab
 -rw-r--r--   1 root  wheel   867 Dec 21 00:04 rc.conf
 -rw-r--r--   1 root  wheel   197 Feb 18 18:19 rc.local

I fiddled standard-supfile to get CURRENT (rather than RELENG_5_0)
and am now able to boot with a config like you describe. Thanks!

You appear to be doing as diskless(8) suggests: mount the server's /
and therefore get its /etc and boot it's kernel (no need to populate a
different directory with clone_root).

But that kernel must have option BOOTP according to the manpage.  If
I recompile my server's kernel with this, the diskless client boots
but if the server will no longer boot because it's hung sending out
bootp requests which noone answers.

Seems like diskless clients would have to have separate kernels with
the option BOOTP while any servers must omit this option.

How do you keep them separate? or am I missing something fundamental?

Thanks.

PS: could you show me your dhcpd.conf so I can see how you're
specifying your root filesystem? Mine's currently:
option root-path192.168.255.185:/;

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message


Re: Diskless: 5.0R scripts, boot, NFS mount problems I didn't have in 4.7S

2003-02-22 Thread Chris Shenton
Matthew Dillon [EMAIL PROTECTED] writes:

 If you do this pxeboot will attempt to load the kernel via TFTP
 instead of via NFS.  You then put your kernel in /tftpboot right along
 side a copy of pxeboot.

 This allows you to netboot a different kernel then the one in the 
 server's root directory.

Ah... [sound of lightbulb going on]

I was wondering why it would be useful to get the kernel via TFTP
rather than the NFS mount.  Makes sense.

Thanks!

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message


Re: Diskless: 5.0R scripts, boot, NFS mount problems I didn't have in 4.7S

2003-02-19 Thread Chris Shenton
Matthew Dillon [EMAIL PROTECTED] writes:

 4.x and -current use the same mechanism, except 4.x uses MFS and
 -current uses MD.

4.x uses /etc/diskless[12] while 5.x (by default) uses
/etc/rc.d/(init)?diskless.  The latter is works very differently than
the former.


 Ignore the handbook.  Try 'man diskless'.

Ouch, will try the man.



 kenv is only used in current's rc.diskless scripts, and it 
 resides in /bin on -current.  

Not on mine:

  chris@Pectopah103 whereis kenv
  kenv: /usr/bin/kenv /usr/share/man/man1/kenv.1.gz /usr/src/bin/kenv
  chris@Pectopah104 ls /bin/kenv
  ls: /bin/kenv: No such file or directory
  chris@Pectopah105 uname -a
  FreeBSD Pectopah.shenton.org 5.0-RELEASE-p1 FreeBSD 5.0-RELEASE-p1 #0: Sun Feb 16 
16:10:36 EST 2003 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/Pectopah  i386

And the /usr/bin/kenv needs the elf libraries to run, it's not static,
so shouldn't live in /sbin and can't run on a diskless box until /usr
is mounted.

 Basically what you do is create a files and directories in
 /conf/base and /conf/default which are used to populate the
 MFS/MD root and other directories.  I have included my setup
 at the end.

Which startup scripts are you running, old diskless[12] or new
rc.d/(init)?diskless ?

Thanks for your examples, I'll plow through them tonight. But -- more
below -- these sure look like 4.x-compatible stuff, not 5.0.

 /conf/base:
 total 5
 drwxr-xr-x  2 root  wheel  512 Dec 21 10:37 dev
 drwxr-xr-x  2 root  wheel  512 Dec 19 22:22 etc
 -rw-r--r--  1 root  wheel   11 Dec 20 15:38 etc.remove
 drwxr-xr-x  2 root  wheel  512 Dec 20 14:31 root
 -rw-r--r--  1 root  wheel   12 Dec 20 15:38 root.remove
 
 /conf/base/dev:
 total 2
 -rw-r--r--  1 root  wheel  18 Dec 21 10:37 diskless_remount
 -rw-r--r--  1 root  wheel   6 Dec 19 22:22 md_size

The etc.remove and md_size are used by 4.x's diskless[12] but NOT by
the 5.x /etc/rc.d/(init?)diskless scripts.  Are you using the old
startup rc stuff, possibly changing the default value in
/etc/defaults/rc.conf:

  rc_ng=YES   # Set to NO to disable new-style rc scripts.

If so and it works for you, I can certainly do the same.  But I'd
still like to figure how to get the 5.x rc.d/* scripts to do their
thing.

Actually, I don't see any code to look for that md_size or
diskless_remount in either of 5.0's rc.diskless[12] or
rc.d/(init)?diskless.  I do know that what you're describing is in
4.x's rc.diskless[12], and I did have that working on a 4.7S system.
That's why I'm having so much trouble with the 5.0 diskless boot --
everything's changed.

Lemme know if I'm way off base but it sounds like you're describing a
4.x diskless boot and my problem's with 5.0.

Thanks a bunch!

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Diskless: 5.0R scripts, boot, NFS mount problems I didn't have in 4.7S

2003-02-19 Thread Chris Shenton
Matthew Dillon [EMAIL PROTECTED] writes:

 Make sure your NFS server is exporting to your subnet and that
 it is running the necessary services, (portmap, mountd, nfsd -t
 -u -n 4). 

My boot server is 5.0, so that's the kernel my diskless box gets.  My
other boxes are 4.x boxes but I'll try.   

4.x's portmap is now 5.x's rpcbind; other processes seem fine too:

   41  ??  IL 0:00.00  (nfsiod 0)
   42  ??  IL 0:00.00  (nfsiod 1)
   43  ??  IL 0:00.00  (nfsiod 2)
   44  ??  IL 0:00.00  (nfsiod 3)
  276  ??  Ss 0:00.08 /usr/sbin/rpcbind
  339  ??  Is 0:00.08 /usr/sbin/mountd -r
  345  ??  Is 0:00.02 nfsd: master (nfsd)
  347  ??  I  2:03.22 nfsd: server (nfsd)
  348  ??  I  0:12.20 nfsd: server (nfsd)
  349  ??  I  0:06.56 nfsd: server (nfsd)
  350  ??  I  0:02.03 nfsd: server (nfsd)

 If you have another box that you can boot normally (not netboot),
 test the NFS server from that box by mounting / and /usr:
 
 other# mount 192.168.255.185:/usr /mnt


I believe I tried mounting a 4.x volume onto the diskless 5.0 box and
it failed in the same way.  I didn't take careful notes so I'll
repeat.

I can mount the 5.0 boot server's /usr onto a 4.7S client with no
problem:

  thanatos(4.7S)# mount 192.168.255.185:/usr /mnt
  thanatos(4.7S)# mount
  /dev/da0s1a on / (ufs, local)
  /dev/da0s1e on /tmp (ufs, local)
  /dev/da0s1g on /usr (ufs, NFS exported, local)
  /dev/da0s1d on /usr/local (ufs, NFS exported, local)
  /dev/da0s1f on /var (ufs, local)
  procfs on /proc (procfs, local)
  linprocfs on /usr/compat/linux/proc (linprocfs, local)
  /dev/da0s1h on /home.THANATOS (ufs, local)
  pectopah:/home on /home (nfs)
  pectopah:/usr/local on /usr/localnew (nfs)
  pectopah:/usr/X11R6 on /usr/local/X11R6 (nfs)
  192.168.255.185:/usr on /mnt (nfs)

The name pectopah is the addr 192.168.255.185 and is the 5.0 NFS
server.

So, it seems it's something broken on my 5.0 NFS client's side.  But I
can mount a 4.7S-exported filesystem onto my 5.0 boot-server so at
least its mount_nfs is OK:

  /sbin/mount_nfs 192.168.255.180:/usr /mnt


 It is also possible that someone has broken something in NFS
 recently.  The -current I am running (which works fine as
 a server for my EPIA 5000 and EPIA M 9000) is several weeks 
 old.

Hmmm, how could I check this out? I'm happy to do testing and provide
feedback. 

 If your /usr partition is on / on your server (i.e. not 
 its own partition), then remember to use the -alldirs option
 in /etc/exports for / and /usr.  If /usr is on its own
 partition you don't need -alldirs unless you are trying to
 mount a subdirectory in / or /usr.  You *might* need -alldirs
 on your / export.  In anycase, I always set -alldirs on all
 my read-only exports and that is what I would recommend you
 do too.

I've removed the readonly flags until I get this working.  I have
separate / and /usr partitions; here's my 5.0 boot-server's
/etc/exports file (Kitchen is the diskless box :-)

  /usr/local  -alldirs -maproot=root  Sisyphus Thanatos Beatnik Kitchen
  /usr-alldirs -maproot=root  Sisyphus Thanatos Beatnik Kitchen
  /home-maproot=root  Sisyphus Thanatos Beatnik Kitchen

And the dhcpd.conf which told the diskless client where to get it's
/ partition from (and that is successful):

host Kitchen.shenton.org {
hardware ethernet   00:40:63:c3:89:bb;
fixed-address   kitchen.shenton.org;
filenamepxeboot;
option root-path192.168.255.185:/usr/local/diskless;
}

Am I correct that I only need to have mount_nfs on the diskless
client, that I do NOT need an rpcbind running on the diskless client
before issuing the mount? Since pxeboot (?) mounts / via NFS, I'm not
understanding why mount_nfs can't. 

Thanks again.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Diskless: 5.0R scripts, boot, NFS mount problems I didn't have in 4.7S

2003-02-19 Thread Chris Shenton
Matthew Dillon [EMAIL PROTECTED] writes:

 Are you sure you have done a recent buildworld/installworld?  It
 sounds like you haven't.  In -current kenv is in /bin  (i.e.
 the source is in /usr/src/bin/kenv on -current) as of the 15th of
 this month.

Well, I'll be danged. I installed 5.0R on Saturday via FTP from
ftp*.freebsd.org, did a cvsup ... /usr/share/examples/cvsup/standard-supfile
then make world.  But I see it in /usr/src/bin/kenv now, cvsupped last
night.  Rebuilding now.  Then need to redo clone_root to populate my
diskless root hierarchy.

Thanks for the kick in the butt.

 You must be working off an out of date source tree.

Weird, perhaps I fat fingered and cvsupped stable and built that --
installing onto a Current system.  That would explain a lot of this
ugliness. 


 I have included -current's current /usr/src/etc/rc.d/initdiskless script

Thank you. 

 If your sources are out of date you should update them...  As
 you can see, the initidiskless script is full of references to
 md_size :-)
 
 # Copyright (c) 1999  Matt Dillion

Ah hah... :-)

 # $FreeBSD: src/etc/rc.d/initdiskless,v 1.23 2003/02/15 16:29:20 jhay Exp $

OK, this is weird; I cvsup daily but am two versions behind you:


  Pectopah# cvsup -l 1 -g -h cvsup2.freebsd.org 
/usr/share/examples/cvsup/standard-supfile 
  Connected to cvsup2.freebsd.org
  Updating collection src-all/cvs
  Finished successfully

  Pectopah# grep '$FreeBSD' /usr/src/etc/rc.d/initdiskless 
  # $FreeBSD: src/etc/rc.d/initdiskless,v 1.21 2002/10/12 10:31:31 schweikh Exp $

Being out of sync would explain a lot.  Looks like the tag in
  standard-supfile points to the wrong thing, rather than the source
  for CURRENT:

  # $FreeBSD: src/share/examples/cvsup/standard-supfile,v 1.21.2.1 2003/01/16 05:59:14 
scottl Exp $
  #
  # This file contains all of the CVSup collections that make up the
  # FreeBSD-current source tree.
  ...
  *default release=cvs tag=RELENG_5_0

OK, I'll change this to tag=. and recvsup, try again.  A big doh!

Many thanks.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Diskless: 5.0R scripts, boot, NFS mount problems I didn't have in 4.7S

2003-02-18 Thread Chris Shenton
I was running a VIA Mini-ITX diskless box off a 4.7-STABLE box for a
while using a root fs created by the clone_root discussed in the
handbook, then some tweaks.  I'm having a heck of a time trying to get
this running under 5.0-RELEASE, now sync'd to 5.0-CURRENT as of
yesterday, then mergemastered.

If someone can provide some clues or pointers, I'd be happy to doc how
I get it to work (for the Handbook?) and could take a stab at updating
clone_root for 5.x if it's needed.


Background:

Been using FreeBSD since 2.2.x.  I can code. I can RTFM. :-)

I've read the 5.0 Release Notes and Early Adopters docs.

I've read Handbook section 19.6 Diskless Operation and it covers the
DHCP, PXE, TFTP, and NFS OK but glosses over how the diskless box
actually boots -- what scripts it runs and such. That's where I'm
stuck. How does a diskless box know to run a diskless boot script
(rather than a standard one)? I'm assuming it invokes init which runs
/etc/rc, which then runs /etc/rc.d/* in the 5.0 model.  Am I close?

I've read Handbook 7.6 Init but it doesn't actually say much about
how init hands off to the rc* scripts.  The man for rc(8) seems to
document the 5.0 rc.d/* well so I'll revisit my diskless boot process. 


Here's what I have working so far:

* isc-dhcpd: offers hostname, IP, location of boot image, root
  filesystem location 
* tftpboot: offers pxeboot
* pxeboot: gets and runs kernel, mounts root filesystem

Then it begins the init/rc startup process and eventually dies. 


Here's what I've found broken or I can't get past:

The clone_root assumes it's copying all the files it needs but (for
example) mtree now lives in /usr/sbin instead of /sbin, so it's not
copied to the diskless root area so it's not available. kenv lives
in /usr/bin, but /usr isn't mounted before kenv is used in
rc.diskless1.  clone_root wants to run /dev/MAKEDEV but that file
doesn't exist in my 5.0 /dev/; I see it in /usr/src/etc/ but it wasn't
install by make installworld or mergemaster. (Is this a glitch?)

The rc.diskless[12] scripts have changed significantly from 4.7 to
5.0. Are they even used with the new /etc/rc.d/* mechanism?

I've run clone_root then manually installed a DISKLESS kernel file
into the new location ($DISKLESSROOT)/boot/kernel/kernel. I've
manually populated $DISKLESSROOT/conf/default/etc/ with an
NFS-oriented fstab, rc.conf, rc.diskless*, and rc.d/[init]diskless and
password-related files.

Upon boot, after kernel loaded, console shows a bunch of rc.conf-style
vars being set, then spews some debugging which I put in
$DISKLESSROOT/conf/default/etc/rc.d/diskless, so it's running that
rather than the old /etc/rc.diskless* files. I've moved the mount -a
near the top of rc.d/diskless since it runs commands which are and not
available until /usr is mounted (e.g., mtree).  The NFS mount fails
with a message I don't understand:

  [udp] pectopah.shenton.org:/usr: RPCPROG_NFS: RPC: Unknown host

This occurs whether I specify a bare hostname, fqdn or IP addr in
fstab, even if I put host info in $DISKLESSROOT/conf/default/etc/.  Is
it really complaining about hosts? or is it an rcpbind thing?  Note
that it has already mounted the root filesystem a while back.

Since it can't mount /usr, everything else fails. 


Can someone point me in the right direction ?  Thanks!

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Diskless: 5.0R scripts, boot, NFS mount problems I didn't have in 4.7S

2003-02-18 Thread Chris Shenton
Chris Shenton [EMAIL PROTECTED] writes:

 I've moved the mount -a near the top of rc.d/diskless since it
 runs commands which are and not available until /usr is mounted
 (e.g., mtree).  The NFS mount fails with a message I don't
 understand:
 
   [udp] pectopah.shenton.org:/usr: RPCPROG_NFS: RPC: Unknown host

Tasteless self-followup:

I get the same error when the boot process fails and drops me to a
shell; I can get it with UDP or TCP mounts. For example:

  mount_nfs -U -2 192.168.255.185:/usr /mnt
  mount_nfs -U -3 192.168.255.185:/usr /mnt
  mount_nfs -T -2 192.168.255.185:/usr /mnt
  mount_nfs -T -3 192.168.255.185:/usr /mnt

The only difference is the [udp] vs [tcp] in the error msg:

  [tcp] 192.168.255.185:/usr: RPCPROG_NFS: RPC: Unknown host

I sniffed traffic with tcpdump and ethereal: the diskless client is
contacting the server so it's not having problems resolving that IP
addr.

I'm not hard-core enough to understand what might cause this failure,
which occurs in /usr/src/sbin/mount_nfs/mount_nfs.c:

if (portspec != NULL) {
/* `ai' contains the complete nfsd sockaddr. */
nfs_nb.buf = ai-ai_addr;
nfs_nb.len = nfs_nb.maxlen = ai-ai_addrlen;
} else {
/* Ask the remote rpcbind. */
nfs_nb.buf = nfs_ss;
nfs_nb.len = nfs_nb.maxlen = sizeof nfs_ss;

if (!rpcb_getaddr(RPCPROG_NFS, nfsvers, nconf, nfs_nb,
hostp)) {
if (rpc_createerr.cf_stat == RPC_PROGVERSMISMATCH 
trymntmode == ANY) {
trymntmode = V2;
goto tryagain;
}
snprintf(errbuf, sizeof errbuf, [%s] %s:%s: %s,
netid, hostp, spec,
clnt_spcreateerror(RPCPROG_NFS));
return (returncode(rpc_createerr.cf_stat,
rpc_createerr.cf_error));
}
}

To see if this was a portmapper/rpcbind issue, I tried doing the
client mount and specifying the port:

  mount -o port=2049 192.168.255.185:/usr /mnt

and got a slightly different error:

  [udp] 192.168.255.185:/usr: RPCMNT: clnt_create: RPC: Unknown host

and now I'm definitely over my head trying to read mount_nfs.c. :-(


I don't understand this since the client is able to mount
/usr/local/diskless to get the root filesystem and run the kernel.
But I believe pxeboot is doing this, not a full FreeBSD binary.
What's the difference in the way the mount?

Any suggestions? Thanks.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message