Mike,
I do have 2 NIC's on the server: and external WAN card and the internal
cluster connection card. I have defined different hostnames on both the
external and internal NIC's as below:

[EMAIL PROTECTED] ~]# cat /etc/hosts
# Do not remove the following line, or various programs # that require
network functionality will fail.
127.0.0.1       localhost.localdomain   localhost
192.168.0.1     Serv.MiloNET Serv oscar_server nfs_oscar pbs_oscar
129.100.171.111 CIB.beowulf.uwo.ca      CIB

A `hostname` command on the server node yields the CIB.beowulf.uwo.ca name.
Out of curiosity, what's the point of defining a hostname for the internal
NIC (Serv.MiloNET) when it is never used? Anyway, the terminal prompt on my
server node is [EMAIL PROTECTED] ~], so it seems the [EMAIL PROTECTED] criteria 
is
satisfied. If you could tell me where to look for the relevant log files,
I'll copy and paste some (hopefully) useful clips in the next response.


-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Michael
Edwards
Sent: Wednesday, May 09, 2007 1:12 PM
To: oscar-users@lists.sourceforge.net
Subject: Re: [Oscar-users] Compute nodes freeze at GRUB prompt

It is possible if you went to the manifacturers website and updated
the BIOS that the problem might go away.  UYOK would probably help
since redhat already found the drivers that the machine needed so
SIS's auto-detection routine doesn't then need to.

On the other issue, post install won't work until all the nodes you
defined are imaged.

If you only defined one initially and that one has been imaged and
rebooted, this probably has to do with how your hostnames are set up
on the head node.  Torque has some fairly tight security and only lets
"[EMAIL PROTECTED]" make configuration changes by default, where hostname
is whatever shows up on the `hostname` command.  Usually the easiest
way to do this is to fiddle around with /etc/hosts.

Usually something like
127.0.0.1 localhost.localdomain localhost
xxx.xxx.xxx.xxx hostname.domainname hostname
10.0.0.1 internalhostname.internaldomainname internalhostname

works, where xxx.xxx.xxx.xxx is the address of the external network
adapter and 10.0.0.1 (or whatever internal network you are using) is
the address of the network addapter you used in the "install_cluster"
command.

If you have a single network card on the head node, this is probably
not the problem...


On 5/9/07, Milo <[EMAIL PROTECTED]> wrote:
> Thanks Mike, I'll give the UYOK options if this continues to be a problem.
> I did however just manage to get one of my compute nodes to boot up
> properly. All I did was manually define the Hard Drive parameters in the
> BIOS instead of letting it auto-detect, and reimaged.  I'm not 100% sure
if
> this solved the issue or if something else just 'clicked' during the
> re-imaging, but it booted up fine.
> Now I'm getting errors when I try to run the post_install scripts step,
> specifically relating to Torque:
>
> create queue workq
> Configuration of TORQUE queues failed, check the logs at /var/spool/pbs at
> /opt/oscar/packages/torque/scripts/post_install line
> 316
> Script /opt/oscar/packages/torque/scripts/post_install
> exitted badly with exit code '2' at ./post_install line 49 Couldn't run
> 'post_install' script for torque at ./post_install line 50 Some of the
post
> install scripts failed, please check your logs for more info at
> ./post_install line 55
> --> Step 7: Failed to properly complete the cluster
> install; please check the logs
>
> Which logs am I suppose to check to track down this issue? I looked around
> the /var/spool/pbs folder, but couldn't find anything of relevance.
> I'm about to ./start-over and reinstall the server fresh since I might
have
> messed something up while trying to fix the GRUB issue, I'll follow-up
once
> I finish later today.
>
> -Milo
>
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Michael
> Edwards
> Sent: Wednesday, May 09, 2007 12:39 PM
> To: oscar-users@lists.sourceforge.net
> Subject: Re: [Oscar-users] Compute nodes freeze at GRUB prompt
>
> The RAID messages are normal spam.  SIS tries loading a lot of drivers
> which don't work because you don't have that kind of hardware.  Now if
> you see some hardware modules that you think should load and don't,
> then that is more of a problem.
>
> The grub messages I am less sure about...
>
> Did you try using the UYOK option on the "Setup Networking" step?
> This uses the kernel and related files from the head node.  This
> frequently solves hardware related issues when the head and compute
> nodes have more or less the same set of hardware.
>
> When the right storage drivers don't get loaded, the messages that pop
> up are often misleading.
>
> On 5/9/07, Milo <[EMAIL PROTECTED]> wrote:
> >
> >
> >
> >
> > Hi All,
> >
> >
> >
> > I've been trying to get a bunch of old P2's clustered together using
> Fedora
> > Core 5 and an old 10Mbit switch.  The install has went relatively
trouble
> > free upto this point. Once any of my compute nodes get successfully
> imaged,
> > they hang at the GRUB prompt on boot-up. The keyboard cursor blinks, but
I
> > can't enter any input and the system just sits there.
> >
> >
> >
> > I looked through the imaging log, and found a few errors relating to
> > software RAID drivers. Don't think it's the problem, but I'll paste the
> > output here aswell:
> >
> > Load software RAID modules.
> >
> > insmod: cannot insert
> > `/lib/modules/2.6.18-boel_v3.7.5/kernel/drivers/md/md-mod.ko':
> > File exists (-1): File exists
> >
> > insmod: cannot insert
> > `/lib/modules/2.6.18-boel_v3.7.5/kernel/drivers/md/md-mod.ko':
> > File exists (-1): File exists
> >
> > modprobe: module raid5 not found.
> >
> > modprobe: failed to load module raid5
> >
> > modprobe: module raid6 not found.
> >
> > modprobe: failed to load module raid6
> >
> > insmod: cannot insert
> > `/lib/modules/2.6.18-boel_v3.7.5/kernel/drivers/md/md-mod.ko':
> > File exists (-1): File exists
> >
> > Load device mapper driver (for LVM).
> >
> > Load additional filesystem drivers.
> >
> > modprobe: module fat not found.
> >
> > modprobe: failed to load module fat
> >
> > modprobe: module vfat not found.
> >
> > modprobe: failed to load module vfat
> >
> >
> >
> >
> >
> > I also get some errors near the end of the log that are probably related
> to
> > this issue:
> >
> >
> >
> > Editing files for actual disk configuration...
> >
> > /dev/hda -> /dev/hda
> >
> > /etc/fstab
> >
> > /etc/systemconfig/systemconfig.conf
> >
> >
> >
> > mount /dev /a/dev -o bind || shellout
> >
> > Use of uninitialized value in concatenation (.) or string at
> > /usr/lib/systemconfig/Initrd/RH.pm line 69.
> >
> > install_device not specified.
> >
> > Probing devices to guess BIOS drives. This may take a long time.
> >
> >
> >
> > install_device not specified.
> >
> > grep: /boot/grub/device.map: No such file or directory
> >
> > mv: cannot stat `/boot/grub/device.map': No such file or directory
> >
> > Probing devices to guess BIOS drives. This may take a long time.
> >
> > Installation finished. No error reported.
> >
> > This is the contents of the device map /boot/grub/device.map.
> >
> > Check if this is correct or not. If any of the lines is incorrect,
> >
> > fix it and re-run the script `grub-install'.
> >
> >
> >
> > (hd0) /dev/hda
> >
> > Use of uninitialized value in concatenation (.) or string at
> > /usr/lib/systemconfig/Boot/Grub.pm line 346.
> >
> > Probing devices to guess BIOS drives. This may take a long time.
> >
> >
> >
> >
> >  Thanks Guys, any and all input/help is muchly appreciated.  I would
have
> > attached the full imaging install log to this post if I could have, but
a
> > copy is not stored on the server I'm told, an I can't boot my compute
> nodes
> > to get at the local copy there..
> >
> >
> >
> >
> >
> > -Milo
> >
> > SharcNET Head Office @ The University of Western Ontario
> >
> >
> >
-------------------------------------------------------------------------
> > This SF.net email is sponsored by DB2 Express
> > Download DB2 Express C - the FREE version of DB2 express and take
> > control of your XML. No limits. Just data. Click to get it now.
> > http://sourceforge.net/powerbar/db2/
> > _______________________________________________
> > Oscar-users mailing list
> > Oscar-users@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/oscar-users
> >
> >
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> Oscar-users mailing list
> Oscar-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/oscar-users
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> Oscar-users mailing list
> Oscar-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/oscar-users
>

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to