Hi Bernard:
1. I collected MAC addresses of clients and
assigned to the client node when I defined the client.
2. The client is able to boot by the
network(PXE), so I enabled the client to boot with PXE.
3. When I rebooted the client, it connected to
the server. But, it couldn't find the DISK0.
The following messages
printed out when I started the client with PXE.
Loading ide-scsi...
Assuming ide-scsi is
compiled into the kernel, not needed, or already loaded.
get_hostname_by_hosts_file
Host file
exists...
Searching for this machine's hostname in /scripts/hosts by IP:
192.168.1.4
This hosts name is: client4
run_pre_install_scripts
>>>
99all.harmless_example_script
I live in
/var/lib/systemimager/scripts/pre-install.
chose_autoinstall_script
Using autoinstall
script: /scripts/client4.sh
write_variables
run_autoinstall_script
>>>
/scripts/client4.sh
get_arch
DISKORDER=sd,hd,cciss,ida,rd
enumerate_disks
DISKS=0
Undefinded: DISK0
Killing off running
processes.
write_varibles
Thanks,
YoungJun
----- Original Message -----
Sent: Tuesday, April 05, 2005 10:20
AM
Subject: RE: [Oscar-users] Error
message while testing Cluster Setup- RHEL WS version 3 (Taroon Update
3)
Hi YoungJun:
Please refer to the installation manual for further
details, but generally there are 2 ways. Both of which require you
collect MAC addresses of the corresponding nodes, choose an image to tie
in with the nodes, then reboot the machines so that they get
imaged.
If the nodes support PXE boot, it's just a matter of
hitting 'Setup Network Boot' in the Setup Networking menu and rebooting
the machine. If not, then you can make an autoinstall floppy and use
that instead. I would highly recommend that you read the relevant
sections of the installation guide before continuing.
Good luck.
Cheers,
Bernard
Hi Bernard:
Thank you for your help. I think I have
built the image successfully. How do I deploy it to the client? Should I
use PXE boot?
Thanks,
YoungJun
----- Original Message -----
Sent: Tuesday, April 05, 2005
2:06 AM
Subject: RE: [Oscar-users] Error
message while testing Cluster Setup- RHEL WS version 3 (Taroon Update
3)
Hi
YoungJun:
You do not need do any manual
installation of the OS on your client node (it is only necessary on
the headnode). All you need to do is use OSCAR to build the
image, and then deploy it.
Cheers,
Bernard
From: YoungJun Kim
[mailto:[EMAIL PROTECTED]
Sent: Tue 05/04/2005 1:39
AM
To: Bernard Li
Subject: Re: [Oscar-users] Error
message while testing Cluster Setup- RHEL WS version 3 (Taroon Update
3)
On my client node, I re-install from the
beginning (WS version 3 (Update 3)), then I did not change anything
except /etc/ssh/sshd_config file.
Here is the scripts while I was
installing in Step 7.
=============================================================================
==
Running step 7 of the OSCAR wizard: Complete cluster
setup
=============================================================================
-->
Step 7: Running: ./post_install
Gathering processor count from
client4.vrlab.
[EMAIL PROTECTED]'s
password:
Updating database for machine client4.vrlab.
[EMAIL PROTECTED]'s password:
building
file list ... done
wrote 44 bytes read 20 bytes 14.22
bytes/sec
total size is 339 speedup is 5.30
--> About
to run /opt/oscar/packages/oda/scripts/post_install for
oda
generating the /etc/odaserver file on all oscar clients
.
/etc/profile.d/c3.sh && cexec 'echo oscar_server >
/etc/odaserver'
[EMAIL PROTECTED]'s
password:
************************* oscar_cluster
*************************
--------- client4---------
-->
About to run /opt/oscar/packages/torque/scripts/post_install for
torque
[EMAIL PROTECTED]'s
password:
mkstemp /var/spool/pbs/mom_priv/.config.Q5UZbo failed: No
such file or directory
rsync error: some files could not be
transferred (code 23) at main.c(620)
[EMAIL PROTECTED]'s password:
PBS mom
config file updated with clienthost: server.vrlab
Pushing config
file to clients...
building file list ... done
config
wrote
151 bytes read 36 bytes 53.43 bytes/sec
total size is
96 speedup is 0.51
Sending SIGHUP to all
moms...
************************* oscar_cluster
*************************
--------- client4---------
pbs_mom: no
process killed
Updating pbs_server nodes
set node client4.vrlab
np = 1
Shutting down PBS
Server:
[ OK ]
Starting PBS
Server:
[ OK ]
Creating pbs workq queue...
Max open servers:
4
set queue workq resources_max.ncpus = 1
set queue workq
resources_max.nodect = 1
set queue workq resources_available.nodect
= 1
set server resources_available.ncpus = 1
set server
resources_available.nodect = 1
set server resources_available.nodes
= 1
set server resources_max.ncpus = 1
set server
resources_max.nodes = 1
set server scheduler_iteration = 60
set
server log_events = 64
Shutting down MAUI Scheduler:
vr
[ OK ]
Starting MAUI
Scheduler:
[ OK ]
--> About to run
/opt/oscar/packages/switcher/scripts/post_install for
switcher
Setting default for tag mpi ("lam-7.0.6")
Attribute
successfully set; new attribute setting will be effective
for
future shells
[EMAIL PROTECTED]'s password:
building
file list ... done
switcher.ini
mkstemp
/opt/env-switcher/etc/.switcher.ini.vs8c6H failed: No such file or
directory
wrote 237 bytes read 36 bytes 109.20
bytes/sec
total size is 188 speedup is 0.69
rsync error:
some files could not be transferred (code 23) at main.c(620)
-->
About to run /opt/oscar/packages/pfilter/scripts/post_install for
pfilter
(re)starting the pfilter firewall service on this
server
/etc/init.d/pfilter restart
Restarting
pfilter:vr
[ OK ]
pushing out the clients pfilter firewall
configuration file
. /etc/profile.d/c3.sh && cpush
/etc/pfilter.conf.clients /etc/pfilter.conf
[EMAIL PROTECTED]'s
password:
Permission denied, please try again.
[EMAIL PROTECTED]'s password:
building
file list ... done
wrote 59 bytes read 20 bytes 12.15
bytes/sec
total size is 855 speedup is 10.82
(re)starting
the pfilter firewall service on the clients
. /etc/profile.d/c3.sh
&& cexec /etc/init.d/pfilter restart
[EMAIL PROTECTED]'s
password:
************************* oscar_cluster
*************************
--------- client4---------
bash: line
1: /etc/init.d/pfilter: No such file or directory
--> About to
run /opt/oscar/packages/opium/scripts/post_install for opium
[EMAIL PROTECTED]'s password:
building
file list ... done
switcher.ini
mkstemp
/opt/env-switcher/etc/.switcher.ini.mOMpxL failed: No such file or
directory
wrote 237 bytes read 36 bytes 78.00
bytes/sec
total size is 188 speedup is 0.69
rsync error:
some files could not be transferred (code 23) at main.c(620)
[EMAIL PROTECTED]'s password:
building
file list ... done
wrote 46 bytes read 20 bytes 26.40
bytes/sec
total size is 596 speedup is 9.03
[EMAIL PROTECTED]'s password:
building
file list ... done
passwd
wrote 81 bytes read 54
bytes 38.57 bytes/sec
total size is 2056 speedup is
15.23
[EMAIL PROTECTED]'s
password:
building file list ... done
group
wrote 80
bytes read 48 bytes 28.44 bytes/sec
total size is
720 speedup is 5.62
[EMAIL PROTECTED]'s password:
building
file list ... done
shadow
wrote 81 bytes read 48
bytes 51.60 bytes/sec
total size is 1245 speedup is
9.65
--> About to run
/opt/oscar/packages/ntpconfig/scripts/post_install for ntpconfig
[EMAIL PROTECTED]'s
password:
************************* oscar_cluster
*************************
--------- client4---------
ntpd:
Removing firewall opening for 127.127.1.0 port 123iptables: Bad rule
(does a matching rule exist in that chain?)
[FAILED]
Shutting
down ntpd: [FAILED]
ntpd: Opening firewall for input from
127.127.1.0 port 123[ OK ]
Starting ntpd: [
OK ]
--> About to run
/opt/oscar/packages/loghost/scripts/post_install for loghost
[EMAIL PROTECTED]'s
password:
************************* oscar_cluster
*************************
---------
client4---------
oscar_loghost already set
--> About to run
/opt/oscar/packages/ganglia/scripts/post_install for ganglia
[EMAIL PROTECTED]'s password:
building
file list ... done
gmond.conf
wrote 85 bytes read 72
bytes 62.80 bytes/sec
total size is 3710 speedup is
23.63
Shutting down GANGLIA
gmond:
[ OK ]
Shutting down GANGLIA
gmetad:
[ OK ]
[EMAIL PROTECTED]'s
password:
************************* oscar_cluster
*************************
--------- client4---------
bash: line
1: /etc/init.d/gmond: No such file or directory
Starting GANGLIA
gmond:
[ OK ]
[EMAIL PROTECTED]'s
password:
************************* oscar_cluster
*************************
--------- client4---------
bash: line
1: /etc/init.d/gmond: No such file or directory
Starting GANGLIA
gmetad:
[ OK ]
--> About to run
/opt/oscar/packages/disable-services/scripts/post_install for
disable-services
POSTFIX is running
Postfix is succesfully
configured. : SERVER NODE
Shutting down
postfix:
[FAILED]
Starting
postfix:
[ OK ]
- finished configuring postfix
Cluster setup
complete!
--> Step 7: Successfully completed the cluster
install
Thanks,
YoungJun
----- Original Message -----
Sent: Tuesday, April 05, 2005
1:08 AM
Subject: RE: [Oscar-users]
Error message while testing Cluster Setup- RHEL WS version 3 (Taroon
Update 3)
Hi
YoungJun:
Is /home mounted on all your
compute nodes (should be mounted off your headnode).
Also, have you done the step
'Complete Cluster Install'?
Cheers,
Bernard
From: [EMAIL PROTECTED]
on behalf of YoungJun Kim
Sent: Tue 05/04/2005 1:01
AM
To: [email protected]
Subject:
[Oscar-users] Error message while testing Cluster Setup- RHEL WS
version 3 (Taroon Update 3)
Hi all,
I tried to test cluster setup and I
have the following errors.
Preparing user tests...
Performing
user tests...
SSH ping
test
[PASSED]
SSH
server->node
[EMAIL PROTECTED].
vrlab's
password:
[EMAIL PROTECTED]'s
password:
[EMAIL PROTECTED]'s
password:
SSH
server->node
[FAILED]
SSH
node->server
[EMAIL PROTECTED].
vrlab's
password:
[EMAIL PROTECTED]'s
password:
[EMAIL PROTECTED]'s
password:
SSH
node->server
[FAILED]
PBS default queue
definition
[PASSED]
Checking for 1 free
nodes:
[FAILED]
Not enough free nodes. Tests incomplete.
Checking for
1 free
nodes:
[FAILED]
Not enough free nodes. Tests incomplete.
Checking for
1 free
nodes:
[FAILED]
Not enough free nodes. Tests incomplete.
Checking for
1 free
nodes:
[FAILED]
Not enough free nodes. Tests incomplete.
Ganglia
test
[FAILED]
There were issues running some user test scripts.
Please check your logs
...Hit <ENTER> key to
exit...
Any ideas?
Thank you,
YoungJun