Just thought I would share my experiences here for those having installation
problems, since I've now been using oscar 6.0.5 successfully on a production
cluster for several months. My first install was marred by broken dependencies
on the torque packages - I was forced to use SGE, which ended up being
suboptimal (you can see my earlier posts if interested). Geoffroy quickly fixed
these dependencies in trunk but the rpm's to this date have not been updated.
As a result, I had all the same issues during my second round of imaging. The
fix is of course to compile your own opkg's. To do so, first download the Oscar
trunk using 'svn co http://svn.oscar.openclustergroup.org/oscar/trunk oscar' -
instructions at
http://svn.oscar.openclustergroup.org/trac/oscar/wiki/SVNinstructions. Get
opkgc using 'yum install opkgc', the go to the oscar/packages/torque directory
and (as root) run 'opkgc --dist=rhel' to compile the opkg's. These will be
located in /usr/src/redhat/RPMS/x86_64/ for a 64-bit centos/rhel distribution.
For those who don't want to bother with this process, I provide the updated
x86_64 opkg's here: http://biophys.chem.columbia.edu/oscar/. Hopefully these
will make it into the unstable repo soon enough (I'm guessing for 6.0.6).
Place these opkg's in /tftpboot/oscar/rhel-5-x86_64/ and run packman
--prepare-repo /tftpboot/oscar/rhel-5-x86_64 (or i386 respectively) to make the
repo available to Oscar. Also create a local repo of your distribution's
install disk per instructions in the official documentation. With the repos in
place, and as long as you remember to turn off all firewalls, selinux, etc.,
installation is a breeze. I should mention this is CentOS 5.5, fully updated.
The next hiccup came after imaging - the nodes imaged just fine but would not
boot, giving kernel panics and errors like "setuproot: unable to mount
/dev/root". This is hardware-related. I am using Dell PowerEdge T610's (8-core)
as my nodes, which require mptbase, mptsas, and ata_piix kernel modules to boot
properly. Solution: before imaging (but after image creation), run mkinitrd
--preload mptbase --preload mptsas --preload ata_piix --without-dmraid
--omit-lvm-modules /var/lib/systemimager/images/<image
name>/boot/initrd-<kernel version>.el5.img <kernel version>. This is of course
assuming you don't need lvm and dmraid for your system, otherwise those
directives may be excluded. You may find your kernel version with 'uname -r'.
With the new initrd in place, run or re-run oscar's Step 6 (if you've run it
already it seems to be important to repeat it for some reason) and then image
your nodes. Using this approach I have all the experimental packages working
except for sge and linux-ha (don't need these).
There are also a couple of bugs in the testing scripts for 6.0.5 (this should
probably go in the devel mailing list but since I am not yet a member I'll
write it here for now): In
/var/lib/perl5/vendor_perl/5.8.8/OSCAR/OCA/RM_Detect/TORQUE.pm, line 38 should
be 'test => "/var/lib/oscar/testing/$pkg/$test"' instead of
/var/lib/oscar/$pkg/testing/$test. In /var/lib/oscar/testing/ganglia/test_user,
both lines 97 and 128 should read "if ($hosts == $numhosts) {", not ($hosts eq
$numhosts).
Finally, I am using multiple NICs on each node and multiple gigabit switches
for communication with OpenMPI (which knows how to fully utilize such a setup).
As a word of advice, the latency has been disappointing for MD simulation
(gromacs) performance - a modern 8-core machine generates so much data so
quickly that it swamps the gigabit network interfaces. There's a nice paper on
partially alleviating these issues: Kutzner C. et al. 2007 J. Computational
Chemistry (28) 12: 2075-2084. For those who can afford it however, I strongly
recommend InfiniBand or Myrinet.
Hope someone finds this useful...
Best,
Ivan
____________________
Ivan V. Sergeyev
McDermott Group
Columbia University
3000 Broadway, MC 3132
New York, NY, 10027
iserge...@gmail.com
------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://ad.doubleclick.net/clk;226879339;13503038;l?
http://clk.atdmt.com/CRS/go/247765532/direct/01/
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users