Just thought I would share my experiences here for those having installation 
problems, since I've now been using oscar 6.0.5 successfully on a production 
cluster for several months. My first install was marred by broken dependencies 
on the torque packages - I was forced to use SGE, which ended up being 
suboptimal (you can see my earlier posts if interested). Geoffroy quickly fixed 
these dependencies in trunk but the rpm's to this date have not been updated. 
As a result, I had all the same issues during my second round of imaging. The 
fix is of course to compile your own opkg's. To do so, first download the Oscar 
trunk using 'svn co http://svn.oscar.openclustergroup.org/oscar/trunk oscar' - 
instructions at 
http://svn.oscar.openclustergroup.org/trac/oscar/wiki/SVNinstructions. Get 
opkgc using 'yum install opkgc', the go to the oscar/packages/torque directory 
and (as root) run 'opkgc --dist=rhel' to compile the opkg's. These will be 
located in /usr/src/redhat/RPMS/x86_64/ for a 64-bit centos/rhel distribution. 
For those who don't want to bother with this process, I provide the updated 
x86_64 opkg's here: http://biophys.chem.columbia.edu/oscar/. Hopefully these 
will make it into the unstable repo soon enough (I'm guessing for 6.0.6). 
Place these opkg's in /tftpboot/oscar/rhel-5-x86_64/ and run packman 
--prepare-repo /tftpboot/oscar/rhel-5-x86_64 (or i386 respectively) to make the 
repo available to Oscar. Also create a local repo of your distribution's 
install disk per instructions in the official documentation. With the repos in 
place, and as long as you remember to turn off all firewalls, selinux, etc., 
installation is a breeze. I should mention this is CentOS 5.5, fully updated. 
The next hiccup came after imaging - the nodes imaged just fine but would not 
boot, giving kernel panics and errors like "setuproot: unable to mount 
/dev/root". This is hardware-related. I am using Dell PowerEdge T610's (8-core) 
as my nodes, which require mptbase, mptsas, and ata_piix kernel modules to boot 
properly. Solution: before imaging (but after image creation), run mkinitrd 
--preload mptbase --preload mptsas --preload ata_piix --without-dmraid 
--omit-lvm-modules /var/lib/systemimager/images/<image 
name>/boot/initrd-<kernel version>.el5.img <kernel version>. This is of course 
assuming you don't need lvm and dmraid for your system, otherwise those 
directives may be excluded. You may find your kernel version with 'uname -r'. 
With the new initrd in place, run or re-run oscar's Step 6 (if you've run it 
already it seems to be important to repeat it for some reason) and then image 
your nodes. Using this approach I have all the experimental packages working 
except for sge and linux-ha (don't need these).
There are also a couple of bugs in the testing scripts for 6.0.5 (this should 
probably go in the devel mailing list but since I am not yet a member I'll 
write it here for now): In 
/var/lib/perl5/vendor_perl/5.8.8/OSCAR/OCA/RM_Detect/TORQUE.pm, line 38 should 
be 'test => "/var/lib/oscar/testing/$pkg/$test"' instead of 
/var/lib/oscar/$pkg/testing/$test. In /var/lib/oscar/testing/ganglia/test_user, 
both lines 97 and 128 should read "if ($hosts == $numhosts) {", not ($hosts eq 
$numhosts). 
Finally, I am using multiple NICs on each node and multiple gigabit switches 
for communication with OpenMPI (which knows how to fully utilize such a setup). 
As a word of advice, the latency has been disappointing for MD simulation 
(gromacs) performance - a modern 8-core machine generates so much data so 
quickly that it swamps the gigabit network interfaces. There's a nice paper on 
partially alleviating these issues: Kutzner C. et al. 2007 J. Computational 
Chemistry (28) 12: 2075-2084. For those who can afford it however, I strongly 
recommend InfiniBand or Myrinet.

Hope someone finds this useful...

Best,
Ivan
____________________
Ivan V. Sergeyev
McDermott Group
Columbia University
3000 Broadway, MC 3132
New York, NY, 10027
iserge...@gmail.com






------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share 
of $1 Million in cash or HP Products. Visit us here for more details:
http://ad.doubleclick.net/clk;226879339;13503038;l?
http://clk.atdmt.com/CRS/go/247765532/direct/01/
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to