YES! That solved the problem. I think I did run update_client. This
is a NASTY bug!
To fix my nodes, I booted into each one with a rescue disk and ran
chroot /mnt/sysimage
systemconfigurator --configsi
It gave me an error message about the ram disk. I then rebooted, and
everything is running fine! Thank all of you for the great support.
|-----Original Message-----
|From: Michael Chase-Salerno [mailto:[EMAIL PROTECTED]
|Sent: Wednesday, March 19, 2003 4:45 PM
|To: Chris Oubre
|Cc: Oscar-users List
|Subject: Re: [Oscar-users] RE: OSCAR cluster dead
|after simple changes
|
|
|Did you happen to run an update_client on the nodes at
|some point? This will have wiped out some of the
|configuration that was done by systemconfigurator,
|which is who creates those files. The networking files
|are probably missing/incorrect also. Currently running
|update_client is not supported on an OSCAR cluster for
|this reason.
|
|If that's the case, you should be able to run
|systemconfigurator again on the nodes from a booted
|rescue disk. You'll have to mount all the filesystems
|and run it chrooted. You can steal the proper command
|from the autoinstall script for the image in
|/var/lib/systemimager/scripts.
|
|Mike
|
|On Wed, 2003-03-19 at 17:31, Chris Oubre wrote:
|> Ok I booted into a node via rescue mode from a Red Hat 7.2
|> installation CD. I noticed that there are certain
|files missing.
|>
|> There is no
|> /etc/lilo.conf
|> /boot/map
|>
|> And no ramdisk
|>
|> Where does OSCAR keep these files? They are also not in the
|> /var/lib/systemimager/images/oscarimage/ tree on the
|master node.
|>
|> This seems odd because I get the LIL- error message
|and that seems to
|> me that it is loading a lilo.conf file but I cannot
|find that file
|> anywhere on the nodes.
|>
|> |-----Original Message-----
|> |From: Keller, Gregory W xNON-EMPLOYEEx
|> |[mailto:[EMAIL PROTECTED]
|> |Sent: Wednesday, March 19, 2003 12:50 PM
|> |To: '[EMAIL PROTECTED]'
|> |Cc: '[EMAIL PROTECTED]'
|> |Subject: Re: OSCAR cluster dead after simple changes
|> |
|> |
|> |Chris,
|> |The change to the switch is irrelevent. Until LILO
|> |bootstraps the kernel your network card
|isn't a factor
|> |- so unless your diskless or using PXE to boot over
|> |the network, this is an install problem on
|the local disks.
|> |
|> |Check out this link:
|>
||http://user.fundy.net/cyclist/linux/troubleshoot-LILO.html
|> |
|> |Here is an excerpt that will point us in
|the right direction:
|> | () No part of LILO has been loaded. LILO either
|> |isn't installed
|> | or the partition on which its boot sector is
|> |located isn't active.
|> | L ... The first stage boot loader has been
|> |loaded and started,
|> | but it can't load the second stage boot loader.
|> |The two-digit error
|> | codes indicate the type of problem. (See also
|> |section "Disk error
|> | codes".) This condition usually
|indicates a media
|> |failure or a geometry
|> | mismatch (e.g. bad disk parameters, see section
|> |"Disk geometry").
|> | LI The first stage boot loader was
|able to load
|> |the second stage boot
|> | loader, but has failed to execute it. This can
|> |either be caused by a
|> | geometry mismatch or by moving /boot/boot.b
|> |without running the map
|> | installer.
|> | LIL The second stage boot loader has been
|> |started, but it can't load
|> | the descriptor table from the map file. This is
|> |typically caused by a
|> | media failure or by a geometry mismatch.
|> | LIL? The second stage boot loader has
|been loaded
|> |at an incorrect
|> | address. This is typically caused by a subtle
|> |geometry mismatch or by
|> | moving /boot/boot.b without running the
|map installer.
|> | LIL- The descriptor table is corrupt. This can
|> |either be caused by a
|> | geometry mismatch or by moving
|/boot/map without
|> |running the map
|> | installer.
|> | LILO All parts of LILO have been
|successfully loaded.
|> |
|> |Let us know what you find - perhaps the
|installer is
|> |having trouble with the map installer.
|> |
|> |Keep smiling,
|> |Greg
|> |
|> |
|> | Message: 2
|> | From: "Chris Oubre" <[EMAIL PROTECTED]>
|> | To: <[EMAIL PROTECTED]>
|> | Date: Tue, 18 Mar 2003 17:24:18 -0600
|> | Subject: [Oscar-users] OSCAR cluster dead after
|> |simple changes
|> |
|> | We are currently working on a nasty problem
|> |with our OSCAR cluster.
|> |
|> | I am running OSCAR 2.1 on RedHat 7.2 with a
|> |modified kernel to
|> | accommodate for my e1000 gigabit cards and
|> |mylex hardware RAID.
|> |
|> | First the problem.
|> | We recently received some new nodes for our
|> |Beowulf cluster. We
|> | brought the cluster down to install them. We
|> |installed an additional
|> | network card to the switch, and manually moved
|> |the position of the
|> | old nodes in the rack. When we rebooted up the
|> |cluster (master and
|> | old nodes), ALL of the old nodes do not boot
|> |up. They stop at a
|> | screen that says
|> |
|> |
|> | LIL-
|> |
|> | I have removed the additional card from the
|> |switch and tried rebooting
|> | everything. No Joy
|> | I am able to ping the master node from the
|> |switch and vice-versa.
|> |
|> | If you could give some ideas of where to look?
|> |Or do I need to
|> | completely reinstall? I'd rather avoid the
|> |reinstall because I do not
|> | know what cause the crash, and thus would be
|> |susceptible to a
|> | reoccurrence.
|> |
|> |
|> |
|> |****************************************************
|> |Christopher D. Oubre *
|> |email: [EMAIL PROTECTED] *
|> |research: http://cmt.rice.edu/~coubre *
|> |Web: http://www.angelfire.com/la2/oubre *
|> |Hangout: http://pub44.ezboard.com/bsouthterrebonne *
|> |Phone:(713)348-3541 Fax: (713)348-4150 *
|> |Rice University *
|> |Department of Physics, M.S. 61 *
|> |6100 Main St. ^-^ *
|> |Houston, Tx 77251-1892, USA (O O) *
|> |-= Phlax=- ( v ) *
|> |************************************m*m*************
|> |
|> |
|> |
|>
|>
|> -------------------------------------------------------
|> This SF.net email is sponsored by: Does your code
|think in ink?
|> You could win a Tablet PC. Get a free Tablet PC hat
|just for playing.
|> What are you waiting for?
|> http://ads.sourceforge.net/cgi-bin/redirect.pl?micr5043en
|> _______________________________________________
|> Oscar-users mailing list
|> [EMAIL PROTECTED]
|> https://lists.sourceforge.net/lists/listinfo/oscar-users
|--
|Michael Chase-Salerno [EMAIL PROTECTED]
|IBM Linux Systems Technology Poughkeepsie, NY
|System Installation Suite www.sisuite.org
|
|
|
-------------------------------------------------------
This SF.net email is sponsored by: Tablet PC.
Does your code think in ink? You could win a Tablet PC.
Get a free Tablet PC hat just for playing. What are you waiting for?
http://ads.sourceforge.net/cgi-bin/redirect.pl?micr5043en
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users