Thanks Mike, The problem was indeed the different hostnames in the /etc/hosts file for the 2 NICs on the server. Once I made them both the same for simplicity, everything worked fine and the cluster is up and running. Thanks for the help
-----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Michael Edwards Sent: Thursday, May 10, 2007 12:39 AM To: oscar-users@lists.sourceforge.net Subject: Re: [Oscar-users] Compute nodes freeze at GRUB prompt The server log is at /var/spool/pbs/server_logs/pbs_server.log You also might look at /var/log/messages and at /opt/oscar/oscarinstall.log On 5/9/07, Milo <[EMAIL PROTECTED]> wrote: > Mike, > I do have 2 NIC's on the server: and external WAN card and the internal > cluster connection card. I have defined different hostnames on both the > external and internal NIC's as below: > > [EMAIL PROTECTED] ~]# cat /etc/hosts > # Do not remove the following line, or various programs # that require > network functionality will fail. > 127.0.0.1 localhost.localdomain localhost > 192.168.0.1 Serv.MiloNET Serv oscar_server nfs_oscar pbs_oscar > 129.100.171.111 CIB.beowulf.uwo.ca CIB > > A `hostname` command on the server node yields the CIB.beowulf.uwo.ca name. > Out of curiosity, what's the point of defining a hostname for the internal > NIC (Serv.MiloNET) when it is never used? Anyway, the terminal prompt on my > server node is [EMAIL PROTECTED] ~], so it seems the [EMAIL PROTECTED] > criteria is > satisfied. If you could tell me where to look for the relevant log files, > I'll copy and paste some (hopefully) useful clips in the next response. > > > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Michael > Edwards > Sent: Wednesday, May 09, 2007 1:12 PM > To: oscar-users@lists.sourceforge.net > Subject: Re: [Oscar-users] Compute nodes freeze at GRUB prompt > > It is possible if you went to the manifacturers website and updated > the BIOS that the problem might go away. UYOK would probably help > since redhat already found the drivers that the machine needed so > SIS's auto-detection routine doesn't then need to. > > On the other issue, post install won't work until all the nodes you > defined are imaged. > > If you only defined one initially and that one has been imaged and > rebooted, this probably has to do with how your hostnames are set up > on the head node. Torque has some fairly tight security and only lets > "[EMAIL PROTECTED]" make configuration changes by default, where hostname > is whatever shows up on the `hostname` command. Usually the easiest > way to do this is to fiddle around with /etc/hosts. > > Usually something like > 127.0.0.1 localhost.localdomain localhost > xxx.xxx.xxx.xxx hostname.domainname hostname > 10.0.0.1 internalhostname.internaldomainname internalhostname > > works, where xxx.xxx.xxx.xxx is the address of the external network > adapter and 10.0.0.1 (or whatever internal network you are using) is > the address of the network addapter you used in the "install_cluster" > command. > > If you have a single network card on the head node, this is probably > not the problem... > > > On 5/9/07, Milo <[EMAIL PROTECTED]> wrote: > > Thanks Mike, I'll give the UYOK options if this continues to be a problem. > > I did however just manage to get one of my compute nodes to boot up > > properly. All I did was manually define the Hard Drive parameters in the > > BIOS instead of letting it auto-detect, and reimaged. I'm not 100% sure > if > > this solved the issue or if something else just 'clicked' during the > > re-imaging, but it booted up fine. > > Now I'm getting errors when I try to run the post_install scripts step, > > specifically relating to Torque: > > > > create queue workq > > Configuration of TORQUE queues failed, check the logs at /var/spool/pbs at > > /opt/oscar/packages/torque/scripts/post_install line > > 316 > > Script /opt/oscar/packages/torque/scripts/post_install > > exitted badly with exit code '2' at ./post_install line 49 Couldn't run > > 'post_install' script for torque at ./post_install line 50 Some of the > post > > install scripts failed, please check your logs for more info at > > ./post_install line 55 > > --> Step 7: Failed to properly complete the cluster > > install; please check the logs > > > > Which logs am I suppose to check to track down this issue? I looked around > > the /var/spool/pbs folder, but couldn't find anything of relevance. > > I'm about to ./start-over and reinstall the server fresh since I might > have > > messed something up while trying to fix the GRUB issue, I'll follow-up > once > > I finish later today. > > > > -Milo > > > > -----Original Message----- > > From: [EMAIL PROTECTED] > > [mailto:[EMAIL PROTECTED] On Behalf Of Michael > > Edwards > > Sent: Wednesday, May 09, 2007 12:39 PM > > To: oscar-users@lists.sourceforge.net > > Subject: Re: [Oscar-users] Compute nodes freeze at GRUB prompt > > > > The RAID messages are normal spam. SIS tries loading a lot of drivers > > which don't work because you don't have that kind of hardware. Now if > > you see some hardware modules that you think should load and don't, > > then that is more of a problem. > > > > The grub messages I am less sure about... > > > > Did you try using the UYOK option on the "Setup Networking" step? > > This uses the kernel and related files from the head node. This > > frequently solves hardware related issues when the head and compute > > nodes have more or less the same set of hardware. > > > > When the right storage drivers don't get loaded, the messages that pop > > up are often misleading. > > > > On 5/9/07, Milo <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > I've been trying to get a bunch of old P2's clustered together using > > Fedora > > > Core 5 and an old 10Mbit switch. The install has went relatively > trouble > > > free upto this point. Once any of my compute nodes get successfully > > imaged, > > > they hang at the GRUB prompt on boot-up. The keyboard cursor blinks, but > I > > > can't enter any input and the system just sits there. > > > > > > > > > > > > I looked through the imaging log, and found a few errors relating to > > > software RAID drivers. Don't think it's the problem, but I'll paste the > > > output here aswell: > > > > > > Load software RAID modules. > > > > > > insmod: cannot insert > > > `/lib/modules/2.6.18-boel_v3.7.5/kernel/drivers/md/md-mod.ko': > > > File exists (-1): File exists > > > > > > insmod: cannot insert > > > `/lib/modules/2.6.18-boel_v3.7.5/kernel/drivers/md/md-mod.ko': > > > File exists (-1): File exists > > > > > > modprobe: module raid5 not found. > > > > > > modprobe: failed to load module raid5 > > > > > > modprobe: module raid6 not found. > > > > > > modprobe: failed to load module raid6 > > > > > > insmod: cannot insert > > > `/lib/modules/2.6.18-boel_v3.7.5/kernel/drivers/md/md-mod.ko': > > > File exists (-1): File exists > > > > > > Load device mapper driver (for LVM). > > > > > > Load additional filesystem drivers. > > > > > > modprobe: module fat not found. > > > > > > modprobe: failed to load module fat > > > > > > modprobe: module vfat not found. > > > > > > modprobe: failed to load module vfat > > > > > > > > > > > > > > > > > > I also get some errors near the end of the log that are probably related > > to > > > this issue: > > > > > > > > > > > > Editing files for actual disk configuration... > > > > > > /dev/hda -> /dev/hda > > > > > > /etc/fstab > > > > > > /etc/systemconfig/systemconfig.conf > > > > > > > > > > > > mount /dev /a/dev -o bind || shellout > > > > > > Use of uninitialized value in concatenation (.) or string at > > > /usr/lib/systemconfig/Initrd/RH.pm line 69. > > > > > > install_device not specified. > > > > > > Probing devices to guess BIOS drives. This may take a long time. > > > > > > > > > > > > install_device not specified. > > > > > > grep: /boot/grub/device.map: No such file or directory > > > > > > mv: cannot stat `/boot/grub/device.map': No such file or directory > > > > > > Probing devices to guess BIOS drives. This may take a long time. > > > > > > Installation finished. No error reported. > > > > > > This is the contents of the device map /boot/grub/device.map. > > > > > > Check if this is correct or not. If any of the lines is incorrect, > > > > > > fix it and re-run the script `grub-install'. > > > > > > > > > > > > (hd0) /dev/hda > > > > > > Use of uninitialized value in concatenation (.) or string at > > > /usr/lib/systemconfig/Boot/Grub.pm line 346. > > > > > > Probing devices to guess BIOS drives. This may take a long time. > > > > > > > > > > > > > > > Thanks Guys, any and all input/help is muchly appreciated. I would > have > > > attached the full imaging install log to this post if I could have, but > a > > > copy is not stored on the server I'm told, an I can't boot my compute > > nodes > > > to get at the local copy there.. > > > > > > > > > > > > > > > > > > -Milo > > > > > > SharcNET Head Office @ The University of Western Ontario > > > > > > > > > > ------------------------------------------------------------------------- > > > This SF.net email is sponsored by DB2 Express > > > Download DB2 Express C - the FREE version of DB2 express and take > > > control of your XML. No limits. Just data. Click to get it now. > > > http://sourceforge.net/powerbar/db2/ > > > _______________________________________________ > > > Oscar-users mailing list > > > Oscar-users@lists.sourceforge.net > > > https://lists.sourceforge.net/lists/listinfo/oscar-users > > > > > > > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by DB2 Express > > Download DB2 Express C - the FREE version of DB2 express and take > > control of your XML. No limits. Just data. Click to get it now. > > http://sourceforge.net/powerbar/db2/ > > _______________________________________________ > > Oscar-users mailing list > > Oscar-users@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/oscar-users > > > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by DB2 Express > > Download DB2 Express C - the FREE version of DB2 express and take > > control of your XML. No limits. Just data. Click to get it now. > > http://sourceforge.net/powerbar/db2/ > > _______________________________________________ > > Oscar-users mailing list > > Oscar-users@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/oscar-users > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Oscar-users mailing list > Oscar-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/oscar-users > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Oscar-users mailing list > Oscar-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/oscar-users > ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-users ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-users