I have installed oscar 5.1b1 on a small cluster with
FC8 on both headnode (eth0: private and eth1 public, i386) and clients
(i386). During the installation
we ran into several problems :

1/ We would like to build an heterogeneous cluster
with both i386 and x86_64 clients. Following the installation guide
we have created to remote repositories in /tftpboot/distro (one for i386
and 1 for x86_64).
In this case the installation fails when trying to build "Oscar Clients
Image".

If we remove one of the remote repositories then the "Build Oscar client
Image" step runs smoothly.

If there a way to have several remote repositories in /tftpboot/distro
for different arch and likely distributions?

2/ We have notice some errors during the creation of the images :

=============================================================================
== Running step 4 of the OSCAR wizard: Build OSCAR client image
=============================================================================

...
  Installing: pvm                          ####################  [ 75/364]  
Installing: pvm                          ##################### [ 75/364] 
/var/tmp/rpm-tmp.2575: line 1: chown: command not found
error: %post(pvm-3.4.5+6-2.i686) scriptlet failed, exit status 127

...
  Installing: torque                                             [175/364]  
Installing: torque                       ##################### [175/364] 
/var/tmp/rpm-tmp.70649: line 4: cat: command not found
error: %post(torque-2.1.8-3oscar.i686) scriptlet failed, exit status 127

and a lot of error like this one

...
  Installing: xorg-x11-filesystem          ##############        [176/364]  
Installing: xorg-x11-filesystem          
could not open ts_done file: [Errno 2] No such file or directory: 
'/var/lib/systemimager/images/fc8-i386/var/lib/yum/transaction-done.2008-03-26.12:38.30'

--> Step 4: Running: post_binary_package_install (fc8-i386, eth0)
...
Analysing configurator.html for sis
Can't call method "Busy" without a package or object reference at 
/opt/oscar/lib/OSCAR/Configbox.pm line 613.
Script /var/lib/oscar/packages/sis/api-post-image exitted badly with exit code 
'11' at /opt/oscar/scripts/post_rpm_install line 85
Couldn't run post_rpm_nochroot for sis at /opt/oscar/scripts/post_rpm_install 
line 86
--> About to run /var/lib/oscar/packages/sync-files/post_rpm_nochroot for 
sync-files
Created templates for image fc8-i386 in /opt/sync_files/templates/image/fc8-i386
Make sure these files do not contain any user accounts.
Only system account IDs should be included!
--> About to run /var/lib/oscar/packages/torque/client-post_install for torque
There were errors running post_rpm_install scripts.  Please check your logs. at 
/opt/oscar/scripts/post_rpm_install line 91
Bad file descriptor at ./oscar_wizard line 535
--> Marking installed bit in ODA for client binary packages
...

>> Evaluating initrd size to be added in the kernel boot options
>> (e.g. /etc/systemimager/pxelinux.cfg/syslinux.cfg):
 >>     suggested value -> ramdisk_size=59728

cat: write error: Broken pipe
cat: write error: Broken pipe
>>> Using kernel from:          /boot/vmlinuz-2.6.24.3-34.fc8
 >> ls -l /etc/systemimager/boot/kernel
...
Automatically create configuration file for systemconfigurator:
  >> /etc/systemconfig/systemconfig.conf
cat: write error: Broken pipe
cat: write error: Broken pipe
cat: write error: Broken pipe
--> Step 6: Successfully enabled UYOK

These errors seems not to prevent the cluster from working
but we would like to be sure of that. And eventually to fix them if
possible.

3/ Finally within the post_install script we run into some problems

img_datasources: images: /var/lib/oscar/packages/ganglia/.configs/fc8-i386
finding gmetad config for image fc8-i386
[ganglia] configurator values for image fc8-i386 not found!
[ganglia] Ganglia gmetad configuration file modified, re-starting daemon...
...

--> About to run /var/lib/oscar/packages/opium/api-post-deploy for opium
image:
$VAR1 = 'fc8-i386';
---------------
gexec_cluster() XML_ParseBuffer() error at line 1:
no element found

We have no ideas of whats going on here...

The configuration of torque also lead to some problems : 

--> About to run /var/lib/oscar/packages/torque/api-post-deploy for torque
[torque] Updating pbs_server nodes
/opt/pbs/bin/pbsnodes: Server has no node list MSG=node list is empty
qmgr obj=oscarnode1.lcmi.local svr=default: Unauthorized Request 
create node oscarnode1.lcmi.local np = 1 , properties = all
qmgr obj=oscarnode2.lcmi.local svr=default: Unauthorized Request 
create node oscarnode2.lcmi.local np = 1 , properties = all
Shutting down TORQUE Server: [OK]
Starting TORQUE Server: [OK]
[torque] Creating TORQUE workq queue...
Max open servers: 4
qmgr obj=workq svr=default: Unauthorized Request 
create queue workq
Configuration of TORQUE queues failed...
 
We have managed to fix this last issue. In our configuration
the headnode has two nic (eth0 : private, eth1: public). 
If we put the DNS name of the headnode within
the /var/spool/pbs/server_name we are able to configure properly torque.
Could this be managed automatically when multiple NIC are detected?

We have enable the headnode to be "part of the cluster" by
checking the "Run batch system client (pbs_mom) on head node" within the
torque configuration. Why the "headnode" is not part of the pbs_server
nodes then?

4/ At last the tests performed well except for the Ganglia setup test
which reports using 3 nodes instead of 2! I suppose this is connected
with the fact for the "headnode" should be part of the pbs nodes (see
above).


Attachment: smime.p7s
Description: S/MIME cryptographic signature

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to