Hi YoungJun:
 
Please refer to the installation manual for further details, but generally there are 2 ways.  Both of which require you collect MAC addresses of the corresponding nodes, choose an image to tie in with the nodes, then reboot the machines so that they get imaged.
 
If the nodes support PXE boot, it's just a matter of hitting 'Setup Network Boot' in the Setup Networking menu and rebooting the machine.  If not, then you can make an autoinstall floppy and use that instead.  I would highly recommend that you read the relevant sections of the installation guide before continuing.
 
Good luck.
 
Cheers,
 
Bernard


From: YoungJun Kim [mailto:[EMAIL PROTECTED]
Sent: Tuesday, April 05, 2005 8:17
To: Bernard Li; [email protected]
Subject: Re: [Oscar-users] Error message while testing Cluster Setup- RHEL WS version 3 (Taroon Update 3)

Hi Bernard:
 
Thank you for your help. I think I have built the image successfully. How do I deploy it to the client? Should I use PXE boot?
 
Thanks,
YoungJun
----- Original Message -----
From: Bernard Li
Sent: Tuesday, April 05, 2005 2:06 AM
Subject: RE: [Oscar-users] Error message while testing Cluster Setup- RHEL WS version 3 (Taroon Update 3)

Hi YoungJun:
 
You do not need do any manual installation of the OS on your client node (it is only necessary on the headnode).  All you need to do is use OSCAR to build the image, and then deploy it.
 
Cheers,
 
Bernard


From: YoungJun Kim [mailto:[EMAIL PROTECTED]
Sent: Tue 05/04/2005 1:39 AM
To: Bernard Li
Subject: Re: [Oscar-users] Error message while testing Cluster Setup- RHEL WS version 3 (Taroon Update 3)

On my client node, I re-install from the beginning (WS version 3 (Update 3)), then I did not change anything except /etc/ssh/sshd_config file.
 
Here is the scripts while I was installing in Step 7.
 
=============================================================================
== Running step 7 of the OSCAR wizard: Complete cluster setup
=============================================================================
 
--> Step 7: Running: ./post_install
Gathering processor count from client4.vrlab.
[EMAIL PROTECTED]'s password:
Updating database for machine client4.vrlab.
[EMAIL PROTECTED]'s password:
building file list ... done
wrote 44 bytes  read 20 bytes  14.22 bytes/sec
total size is 339  speedup is 5.30
--> About to run /opt/oscar/packages/oda/scripts/post_install for oda
generating the /etc/odaserver file on all oscar clients
. /etc/profile.d/c3.sh && cexec 'echo oscar_server > /etc/odaserver'
[EMAIL PROTECTED]'s password:
************************* oscar_cluster *************************
--------- client4---------
--> About to run /opt/oscar/packages/torque/scripts/post_install for torque
[EMAIL PROTECTED]'s password:
mkstemp /var/spool/pbs/mom_priv/.config.Q5UZbo failed: No such file or directory
rsync error: some files could not be transferred (code 23) at main.c(620)
[EMAIL PROTECTED]'s password:
PBS mom config file updated with clienthost: server.vrlab
Pushing config file to clients...
building file list ... done
config
wrote 151 bytes  read 36 bytes  53.43 bytes/sec
total size is 96  speedup is 0.51
Sending SIGHUP to all moms...
************************* oscar_cluster *************************
--------- client4---------
pbs_mom: no process killed
Updating pbs_server nodes
set node client4.vrlab np = 1
Shutting down PBS Server:                                  [  OK  ]
Starting PBS Server:                                       [  OK  ]
Creating pbs workq queue...
Max open servers: 4
set queue workq resources_max.ncpus = 1
set queue workq resources_max.nodect = 1
set queue workq resources_available.nodect = 1
set server resources_available.ncpus = 1
set server resources_available.nodect = 1
set server resources_available.nodes = 1
set server resources_max.ncpus = 1
set server resources_max.nodes = 1
set server scheduler_iteration = 60
set server log_events = 64
Shutting down MAUI Scheduler: vr                           [  OK  ]
Starting MAUI Scheduler:                                   [  OK  ]
--> About to run /opt/oscar/packages/switcher/scripts/post_install for switcher
Setting default for tag mpi ("lam-7.0.6")
Attribute successfully set; new attribute setting will be effective for
future shells
[EMAIL PROTECTED]'s password:
building file list ... done
switcher.ini
mkstemp /opt/env-switcher/etc/.switcher.ini.vs8c6H failed: No such file or directory
wrote 237 bytes  read 36 bytes  109.20 bytes/sec
total size is 188  speedup is 0.69
rsync error: some files could not be transferred (code 23) at main.c(620)
--> About to run /opt/oscar/packages/pfilter/scripts/post_install for pfilter
(re)starting the pfilter firewall service on this server
/etc/init.d/pfilter restart
Restarting pfilter:vr                                      [  OK  ]
pushing out the clients pfilter firewall configuration file
. /etc/profile.d/c3.sh && cpush /etc/pfilter.conf.clients /etc/pfilter.conf
[EMAIL PROTECTED]'s password:
Permission denied, please try again.
[EMAIL PROTECTED]'s password:
building file list ... done
wrote 59 bytes  read 20 bytes  12.15 bytes/sec
total size is 855  speedup is 10.82
(re)starting the pfilter firewall service on the clients
. /etc/profile.d/c3.sh && cexec /etc/init.d/pfilter restart
[EMAIL PROTECTED]'s password:
************************* oscar_cluster *************************
--------- client4---------
bash: line 1: /etc/init.d/pfilter: No such file or directory
--> About to run /opt/oscar/packages/opium/scripts/post_install for opium
[EMAIL PROTECTED]'s password:
building file list ... done
switcher.ini
mkstemp /opt/env-switcher/etc/.switcher.ini.mOMpxL failed: No such file or directory
wrote 237 bytes  read 36 bytes  78.00 bytes/sec
total size is 188  speedup is 0.69
rsync error: some files could not be transferred (code 23) at main.c(620)
[EMAIL PROTECTED]'s password:
building file list ... done
wrote 46 bytes  read 20 bytes  26.40 bytes/sec
total size is 596  speedup is 9.03
[EMAIL PROTECTED]'s password:
building file list ... done
passwd
wrote 81 bytes  read 54 bytes  38.57 bytes/sec
total size is 2056  speedup is 15.23
[EMAIL PROTECTED]'s password:
building file list ... done
group
wrote 80 bytes  read 48 bytes  28.44 bytes/sec
total size is 720  speedup is 5.62
[EMAIL PROTECTED]'s password:
building file list ... done
shadow
wrote 81 bytes  read 48 bytes  51.60 bytes/sec
total size is 1245  speedup is 9.65
--> About to run /opt/oscar/packages/ntpconfig/scripts/post_install for ntpconfig
[EMAIL PROTECTED]'s password:
************************* oscar_cluster *************************
--------- client4---------
ntpd: Removing firewall opening for 127.127.1.0 port 123iptables: Bad rule (does a matching rule exist in that chain?)
[FAILED]
Shutting down ntpd: [FAILED]
ntpd: Opening firewall for input from 127.127.1.0 port 123[  OK  ]
Starting ntpd: [  OK  ]
--> About to run /opt/oscar/packages/loghost/scripts/post_install for loghost
[EMAIL PROTECTED]'s password:
************************* oscar_cluster *************************
--------- client4---------
oscar_loghost already set
--> About to run /opt/oscar/packages/ganglia/scripts/post_install for ganglia
[EMAIL PROTECTED]'s password:
building file list ... done
gmond.conf
wrote 85 bytes  read 72 bytes  62.80 bytes/sec
total size is 3710  speedup is 23.63
Shutting down GANGLIA gmond:                               [  OK  ]
Shutting down GANGLIA gmetad:                              [  OK  ]
[EMAIL PROTECTED]'s password:
************************* oscar_cluster *************************
--------- client4---------
bash: line 1: /etc/init.d/gmond: No such file or directory
Starting GANGLIA gmond:                                    [  OK  ]
[EMAIL PROTECTED]'s password:
************************* oscar_cluster *************************
--------- client4---------
bash: line 1: /etc/init.d/gmond: No such file or directory
Starting GANGLIA gmetad:                                   [  OK  ]
--> About to run /opt/oscar/packages/disable-services/scripts/post_install for disable-services
POSTFIX is running
Postfix is succesfully configured. : SERVER NODE
Shutting down postfix:                                     [FAILED]
Starting postfix:                                          [  OK  ]
- finished configuring postfix
Cluster setup complete!
--> Step 7: Successfully completed the cluster install
 
Thanks,
YoungJun
----- Original Message -----
From: Bernard Li
Sent: Tuesday, April 05, 2005 1:08 AM
Subject: RE: [Oscar-users] Error message while testing Cluster Setup- RHEL WS version 3 (Taroon Update 3)

Hi YoungJun:
 
Is /home mounted on all your compute nodes (should be mounted off your headnode).
 
Also, have you done the step 'Complete Cluster Install'?
 
Cheers,
 
Bernard


From: [EMAIL PROTECTED] on behalf of YoungJun Kim
Sent: Tue 05/04/2005 1:01 AM
To: [email protected]
Subject: [Oscar-users] Error message while testing Cluster Setup- RHEL WS version 3 (Taroon Update 3)

Hi all,
 
I tried to test cluster setup and I have the following errors.
 
Performing root tests...
Shutting down PBS Server:                                  [  OK  ]
Connection refused
/opt/pbs/bin/pbsnodes: cannot connect to server pbs_oscar, error=111
PBS node check                                                 [PASSED]
Starting PBS Server:                                       [  OK  ]
PBS service check:pbs_server                                   [PASSED]
Maui service check:maui                                        [PASSED]
[EMAIL PROTECTED]'s password:
[EMAIL PROTECTED]'s password:
[EMAIL PROTECTED]'s password:
[EMAIL PROTECTED]'s password:
[EMAIL PROTECTED]'s password:
/home mounts                    client4.vrlab                  [EMAIL PROTECTED]
b's password:
/home mounts                    1 nodes failed                 [FAILED]
 
Preparing user tests...
Performing user tests...
SSH ping test                                                  [PASSED]
SSH server->node                                               [EMAIL PROTECTED].
vrlab's password:
[EMAIL PROTECTED]'s password:
[EMAIL PROTECTED]'s password:
SSH server->node                                               [FAILED]
SSH node->server                                               [EMAIL PROTECTED].
vrlab's password:
[EMAIL PROTECTED]'s password:
[EMAIL PROTECTED]'s password:
SSH node->server                                               [FAILED]
PBS default queue definition                                   [PASSED]
Checking for 1 free nodes:                                     [FAILED]
Not enough free nodes. Tests incomplete.
Checking for 1 free nodes:                                     [FAILED]
Not enough free nodes. Tests incomplete.
Checking for 1 free nodes:                                     [FAILED]
Not enough free nodes. Tests incomplete.
Checking for 1 free nodes:                                     [FAILED]
Not enough free nodes. Tests incomplete.
Ganglia test                                                   [FAILED]
There were issues running some user test scripts.  Please check your logs
 
...Hit <ENTER> key to exit...
 
Any ideas?
 
Thank you,
YoungJun

Reply via email to