The server log is at /var/spool/pbs/server_logs/pbs_server.log

You also might look at /var/log/messages and at /opt/oscar/oscarinstall.log

On 5/9/07, Milo <[EMAIL PROTECTED]> wrote:
> Mike,
> I do have 2 NIC's on the server: and external WAN card and the internal
> cluster connection card. I have defined different hostnames on both the
> external and internal NIC's as below:
>
> [EMAIL PROTECTED] ~]# cat /etc/hosts
> # Do not remove the following line, or various programs # that require
> network functionality will fail.
> 127.0.0.1       localhost.localdomain   localhost
> 192.168.0.1     Serv.MiloNET Serv oscar_server nfs_oscar pbs_oscar
> 129.100.171.111 CIB.beowulf.uwo.ca      CIB
>
> A `hostname` command on the server node yields the CIB.beowulf.uwo.ca name.
> Out of curiosity, what's the point of defining a hostname for the internal
> NIC (Serv.MiloNET) when it is never used? Anyway, the terminal prompt on my
> server node is [EMAIL PROTECTED] ~], so it seems the [EMAIL PROTECTED] 
> criteria is
> satisfied. If you could tell me where to look for the relevant log files,
> I'll copy and paste some (hopefully) useful clips in the next response.
>
>
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Michael
> Edwards
> Sent: Wednesday, May 09, 2007 1:12 PM
> To: oscar-users@lists.sourceforge.net
> Subject: Re: [Oscar-users] Compute nodes freeze at GRUB prompt
>
> It is possible if you went to the manifacturers website and updated
> the BIOS that the problem might go away.  UYOK would probably help
> since redhat already found the drivers that the machine needed so
> SIS's auto-detection routine doesn't then need to.
>
> On the other issue, post install won't work until all the nodes you
> defined are imaged.
>
> If you only defined one initially and that one has been imaged and
> rebooted, this probably has to do with how your hostnames are set up
> on the head node.  Torque has some fairly tight security and only lets
> "[EMAIL PROTECTED]" make configuration changes by default, where hostname
> is whatever shows up on the `hostname` command.  Usually the easiest
> way to do this is to fiddle around with /etc/hosts.
>
> Usually something like
> 127.0.0.1 localhost.localdomain localhost
> xxx.xxx.xxx.xxx hostname.domainname hostname
> 10.0.0.1 internalhostname.internaldomainname internalhostname
>
> works, where xxx.xxx.xxx.xxx is the address of the external network
> adapter and 10.0.0.1 (or whatever internal network you are using) is
> the address of the network addapter you used in the "install_cluster"
> command.
>
> If you have a single network card on the head node, this is probably
> not the problem...
>
>
> On 5/9/07, Milo <[EMAIL PROTECTED]> wrote:
> > Thanks Mike, I'll give the UYOK options if this continues to be a problem.
> > I did however just manage to get one of my compute nodes to boot up
> > properly. All I did was manually define the Hard Drive parameters in the
> > BIOS instead of letting it auto-detect, and reimaged.  I'm not 100% sure
> if
> > this solved the issue or if something else just 'clicked' during the
> > re-imaging, but it booted up fine.
> > Now I'm getting errors when I try to run the post_install scripts step,
> > specifically relating to Torque:
> >
> > create queue workq
> > Configuration of TORQUE queues failed, check the logs at /var/spool/pbs at
> > /opt/oscar/packages/torque/scripts/post_install line
> > 316
> > Script /opt/oscar/packages/torque/scripts/post_install
> > exitted badly with exit code '2' at ./post_install line 49 Couldn't run
> > 'post_install' script for torque at ./post_install line 50 Some of the
> post
> > install scripts failed, please check your logs for more info at
> > ./post_install line 55
> > --> Step 7: Failed to properly complete the cluster
> > install; please check the logs
> >
> > Which logs am I suppose to check to track down this issue? I looked around
> > the /var/spool/pbs folder, but couldn't find anything of relevance.
> > I'm about to ./start-over and reinstall the server fresh since I might
> have
> > messed something up while trying to fix the GRUB issue, I'll follow-up
> once
> > I finish later today.
> >
> > -Milo
> >
> > -----Original Message-----
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On Behalf Of Michael
> > Edwards
> > Sent: Wednesday, May 09, 2007 12:39 PM
> > To: oscar-users@lists.sourceforge.net
> > Subject: Re: [Oscar-users] Compute nodes freeze at GRUB prompt
> >
> > The RAID messages are normal spam.  SIS tries loading a lot of drivers
> > which don't work because you don't have that kind of hardware.  Now if
> > you see some hardware modules that you think should load and don't,
> > then that is more of a problem.
> >
> > The grub messages I am less sure about...
> >
> > Did you try using the UYOK option on the "Setup Networking" step?
> > This uses the kernel and related files from the head node.  This
> > frequently solves hardware related issues when the head and compute
> > nodes have more or less the same set of hardware.
> >
> > When the right storage drivers don't get loaded, the messages that pop
> > up are often misleading.
> >
> > On 5/9/07, Milo <[EMAIL PROTECTED]> wrote:
> > >
> > >
> > >
> > >
> > > Hi All,
> > >
> > >
> > >
> > > I've been trying to get a bunch of old P2's clustered together using
> > Fedora
> > > Core 5 and an old 10Mbit switch.  The install has went relatively
> trouble
> > > free upto this point. Once any of my compute nodes get successfully
> > imaged,
> > > they hang at the GRUB prompt on boot-up. The keyboard cursor blinks, but
> I
> > > can't enter any input and the system just sits there.
> > >
> > >
> > >
> > > I looked through the imaging log, and found a few errors relating to
> > > software RAID drivers. Don't think it's the problem, but I'll paste the
> > > output here aswell:
> > >
> > > Load software RAID modules.
> > >
> > > insmod: cannot insert
> > > `/lib/modules/2.6.18-boel_v3.7.5/kernel/drivers/md/md-mod.ko':
> > > File exists (-1): File exists
> > >
> > > insmod: cannot insert
> > > `/lib/modules/2.6.18-boel_v3.7.5/kernel/drivers/md/md-mod.ko':
> > > File exists (-1): File exists
> > >
> > > modprobe: module raid5 not found.
> > >
> > > modprobe: failed to load module raid5
> > >
> > > modprobe: module raid6 not found.
> > >
> > > modprobe: failed to load module raid6
> > >
> > > insmod: cannot insert
> > > `/lib/modules/2.6.18-boel_v3.7.5/kernel/drivers/md/md-mod.ko':
> > > File exists (-1): File exists
> > >
> > > Load device mapper driver (for LVM).
> > >
> > > Load additional filesystem drivers.
> > >
> > > modprobe: module fat not found.
> > >
> > > modprobe: failed to load module fat
> > >
> > > modprobe: module vfat not found.
> > >
> > > modprobe: failed to load module vfat
> > >
> > >
> > >
> > >
> > >
> > > I also get some errors near the end of the log that are probably related
> > to
> > > this issue:
> > >
> > >
> > >
> > > Editing files for actual disk configuration...
> > >
> > > /dev/hda -> /dev/hda
> > >
> > > /etc/fstab
> > >
> > > /etc/systemconfig/systemconfig.conf
> > >
> > >
> > >
> > > mount /dev /a/dev -o bind || shellout
> > >
> > > Use of uninitialized value in concatenation (.) or string at
> > > /usr/lib/systemconfig/Initrd/RH.pm line 69.
> > >
> > > install_device not specified.
> > >
> > > Probing devices to guess BIOS drives. This may take a long time.
> > >
> > >
> > >
> > > install_device not specified.
> > >
> > > grep: /boot/grub/device.map: No such file or directory
> > >
> > > mv: cannot stat `/boot/grub/device.map': No such file or directory
> > >
> > > Probing devices to guess BIOS drives. This may take a long time.
> > >
> > > Installation finished. No error reported.
> > >
> > > This is the contents of the device map /boot/grub/device.map.
> > >
> > > Check if this is correct or not. If any of the lines is incorrect,
> > >
> > > fix it and re-run the script `grub-install'.
> > >
> > >
> > >
> > > (hd0) /dev/hda
> > >
> > > Use of uninitialized value in concatenation (.) or string at
> > > /usr/lib/systemconfig/Boot/Grub.pm line 346.
> > >
> > > Probing devices to guess BIOS drives. This may take a long time.
> > >
> > >
> > >
> > >
> > >  Thanks Guys, any and all input/help is muchly appreciated.  I would
> have
> > > attached the full imaging install log to this post if I could have, but
> a
> > > copy is not stored on the server I'm told, an I can't boot my compute
> > nodes
> > > to get at the local copy there..
> > >
> > >
> > >
> > >
> > >
> > > -Milo
> > >
> > > SharcNET Head Office @ The University of Western Ontario
> > >
> > >
> > >
> -------------------------------------------------------------------------
> > > This SF.net email is sponsored by DB2 Express
> > > Download DB2 Express C - the FREE version of DB2 express and take
> > > control of your XML. No limits. Just data. Click to get it now.
> > > http://sourceforge.net/powerbar/db2/
> > > _______________________________________________
> > > Oscar-users mailing list
> > > Oscar-users@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/oscar-users
> > >
> > >
> >
> > -------------------------------------------------------------------------
> > This SF.net email is sponsored by DB2 Express
> > Download DB2 Express C - the FREE version of DB2 express and take
> > control of your XML. No limits. Just data. Click to get it now.
> > http://sourceforge.net/powerbar/db2/
> > _______________________________________________
> > Oscar-users mailing list
> > Oscar-users@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/oscar-users
> >
> >
> > -------------------------------------------------------------------------
> > This SF.net email is sponsored by DB2 Express
> > Download DB2 Express C - the FREE version of DB2 express and take
> > control of your XML. No limits. Just data. Click to get it now.
> > http://sourceforge.net/powerbar/db2/
> > _______________________________________________
> > Oscar-users mailing list
> > Oscar-users@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/oscar-users
> >
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> Oscar-users mailing list
> Oscar-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/oscar-users
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> Oscar-users mailing list
> Oscar-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/oscar-users
>

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to