I have installed oscar 5.1b1 on a small cluster with FC8 on both headnode (eth0: private and eth1 public, i386) and clients (i386). During the installation we ran into several problems :
1/ We would like to build an heterogeneous cluster with both i386 and x86_64 clients. Following the installation guide we have created to remote repositories in /tftpboot/distro (one for i386 and 1 for x86_64). In this case the installation fails when trying to build "Oscar Clients Image". If we remove one of the remote repositories then the "Build Oscar client Image" step runs smoothly. If there a way to have several remote repositories in /tftpboot/distro for different arch and likely distributions? 2/ We have notice some errors during the creation of the images : ============================================================================= == Running step 4 of the OSCAR wizard: Build OSCAR client image ============================================================================= ... Installing: pvm #################### [ 75/364] Installing: pvm ##################### [ 75/364] /var/tmp/rpm-tmp.2575: line 1: chown: command not found error: %post(pvm-3.4.5+6-2.i686) scriptlet failed, exit status 127 ... Installing: torque [175/364] Installing: torque ##################### [175/364] /var/tmp/rpm-tmp.70649: line 4: cat: command not found error: %post(torque-2.1.8-3oscar.i686) scriptlet failed, exit status 127 and a lot of error like this one ... Installing: xorg-x11-filesystem ############## [176/364] Installing: xorg-x11-filesystem could not open ts_done file: [Errno 2] No such file or directory: '/var/lib/systemimager/images/fc8-i386/var/lib/yum/transaction-done.2008-03-26.12:38.30' --> Step 4: Running: post_binary_package_install (fc8-i386, eth0) ... Analysing configurator.html for sis Can't call method "Busy" without a package or object reference at /opt/oscar/lib/OSCAR/Configbox.pm line 613. Script /var/lib/oscar/packages/sis/api-post-image exitted badly with exit code '11' at /opt/oscar/scripts/post_rpm_install line 85 Couldn't run post_rpm_nochroot for sis at /opt/oscar/scripts/post_rpm_install line 86 --> About to run /var/lib/oscar/packages/sync-files/post_rpm_nochroot for sync-files Created templates for image fc8-i386 in /opt/sync_files/templates/image/fc8-i386 Make sure these files do not contain any user accounts. Only system account IDs should be included! --> About to run /var/lib/oscar/packages/torque/client-post_install for torque There were errors running post_rpm_install scripts. Please check your logs. at /opt/oscar/scripts/post_rpm_install line 91 Bad file descriptor at ./oscar_wizard line 535 --> Marking installed bit in ODA for client binary packages ... >> Evaluating initrd size to be added in the kernel boot options >> (e.g. /etc/systemimager/pxelinux.cfg/syslinux.cfg): >> suggested value -> ramdisk_size=59728 cat: write error: Broken pipe cat: write error: Broken pipe >>> Using kernel from: /boot/vmlinuz-2.6.24.3-34.fc8 >> ls -l /etc/systemimager/boot/kernel ... Automatically create configuration file for systemconfigurator: >> /etc/systemconfig/systemconfig.conf cat: write error: Broken pipe cat: write error: Broken pipe cat: write error: Broken pipe --> Step 6: Successfully enabled UYOK These errors seems not to prevent the cluster from working but we would like to be sure of that. And eventually to fix them if possible. 3/ Finally within the post_install script we run into some problems img_datasources: images: /var/lib/oscar/packages/ganglia/.configs/fc8-i386 finding gmetad config for image fc8-i386 [ganglia] configurator values for image fc8-i386 not found! [ganglia] Ganglia gmetad configuration file modified, re-starting daemon... ... --> About to run /var/lib/oscar/packages/opium/api-post-deploy for opium image: $VAR1 = 'fc8-i386'; --------------- gexec_cluster() XML_ParseBuffer() error at line 1: no element found We have no ideas of whats going on here... The configuration of torque also lead to some problems : --> About to run /var/lib/oscar/packages/torque/api-post-deploy for torque [torque] Updating pbs_server nodes /opt/pbs/bin/pbsnodes: Server has no node list MSG=node list is empty qmgr obj=oscarnode1.lcmi.local svr=default: Unauthorized Request create node oscarnode1.lcmi.local np = 1 , properties = all qmgr obj=oscarnode2.lcmi.local svr=default: Unauthorized Request create node oscarnode2.lcmi.local np = 1 , properties = all Shutting down TORQUE Server: [OK] Starting TORQUE Server: [OK] [torque] Creating TORQUE workq queue... Max open servers: 4 qmgr obj=workq svr=default: Unauthorized Request create queue workq Configuration of TORQUE queues failed... We have managed to fix this last issue. In our configuration the headnode has two nic (eth0 : private, eth1: public). If we put the DNS name of the headnode within the /var/spool/pbs/server_name we are able to configure properly torque. Could this be managed automatically when multiple NIC are detected? We have enable the headnode to be "part of the cluster" by checking the "Run batch system client (pbs_mom) on head node" within the torque configuration. Why the "headnode" is not part of the pbs_server nodes then? 4/ At last the tests performed well except for the Ganglia setup test which reports using 3 nodes instead of 2! I suppose this is connected with the fact for the "headnode" should be part of the pbs nodes (see above).
smime.p7s
Description: S/MIME cryptographic signature
------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-users