Hi Bernard:
1. I collected MAC addresses of clients and
assigned to the client node when I defined the client.
2. The client is able to boot by the
network(PXE), so I enabled the client to boot with PXE.
3. When I rebooted the client, it connected to
the server. But, it couldn't find the DISK0.
The following messages
printed out when I started the client with PXE.
Loading ide-scsi...
Assuming ide-scsi is
compiled into the kernel, not needed, or already loaded.
get_hostname_by_hosts_file
Host file
exists...
Searching for this machine's hostname in /scripts/hosts by IP:
192.168.1.4
This hosts name is: client4
run_pre_install_scripts
>>>
99all.harmless_example_script
I live in
/var/lib/systemimager/scripts/pre-install.
chose_autoinstall_script
Using autoinstall
script: /scripts/client4.sh
write_variables
run_autoinstall_script
>>>
/scripts/client4.sh
get_arch
DISKORDER=sd,hd,cciss,ida,rd
enumerate_disks
DISKS=0
Undefinded: DISK0
Killing off running
processes.
write_varibles
Thanks,
YoungJun
----- Original Message -----
Sent: Tuesday, April 05, 2005 10:20
AM
Subject: RE: [Oscar-users] Error
message while testing Cluster Setup- RHEL WS version 3 (Taroon Update
3)
Hi YoungJun:
Please refer to the installation manual for further
details, but generally there are 2 ways. Both of which require you
collect MAC addresses of the corresponding nodes, choose an image to tie in
with the nodes, then reboot the machines so that they get
imaged.
If the nodes support PXE boot, it's just a matter of
hitting 'Setup Network Boot' in the Setup Networking menu and rebooting the
machine. If not, then you can make an autoinstall floppy and use that
instead. I would highly recommend that you read the relevant sections
of the installation guide before continuing.
Good luck.
Cheers,
Bernard
Hi Bernard:
Thank you for your help. I think I have built
the image successfully. How do I deploy it to the client? Should I use PXE
boot?
Thanks,
YoungJun
----- Original Message -----
Sent: Tuesday, April 05, 2005 2:06
AM
Subject: RE: [Oscar-users] Error
message while testing Cluster Setup- RHEL WS version 3 (Taroon Update
3)
Hi
YoungJun:
You do not need do any manual
installation of the OS on your client node (it is only necessary on the
headnode). All you need to do is use OSCAR to build the image, and
then deploy it.
Cheers,
Bernard
From: YoungJun Kim
[mailto:[EMAIL PROTECTED]
Sent: Tue 05/04/2005 1:39
AM
To: Bernard Li
Subject: Re: [Oscar-users] Error
message while testing Cluster Setup- RHEL WS version 3 (Taroon Update
3)
On my client node, I re-install from the
beginning (WS version 3 (Update 3)), then I did not change anything
except /etc/ssh/sshd_config file.
Here is the scripts while I was installing
in Step 7.
=============================================================================
==
Running step 7 of the OSCAR wizard: Complete cluster
setup
=============================================================================
-->
Step 7: Running: ./post_install
Gathering processor count from
client4.vrlab.
[EMAIL PROTECTED]'s
password:
Updating database for machine client4.vrlab.
[EMAIL PROTECTED]'s password:
building
file list ... done
wrote 44 bytes read 20 bytes 14.22
bytes/sec
total size is 339 speedup is 5.30
--> About to
run /opt/oscar/packages/oda/scripts/post_install for oda
generating
the /etc/odaserver file on all oscar clients
. /etc/profile.d/c3.sh
&& cexec 'echo oscar_server > /etc/odaserver'
[EMAIL PROTECTED]'s
password:
************************* oscar_cluster
*************************
--------- client4---------
--> About
to run /opt/oscar/packages/torque/scripts/post_install for torque
[EMAIL PROTECTED]'s password:
mkstemp
/var/spool/pbs/mom_priv/.config.Q5UZbo failed: No such file or
directory
rsync error: some files could not be transferred (code 23)
at main.c(620)
[EMAIL PROTECTED]'s
password:
PBS mom config file updated with clienthost:
server.vrlab
Pushing config file to clients...
building file list
... done
config
wrote 151 bytes read 36 bytes 53.43
bytes/sec
total size is 96 speedup is 0.51
Sending SIGHUP to
all moms...
************************* oscar_cluster
*************************
--------- client4---------
pbs_mom: no
process killed
Updating pbs_server nodes
set node client4.vrlab np
= 1
Shutting down PBS
Server:
[ OK ]
Starting PBS
Server:
[ OK ]
Creating pbs workq queue...
Max open servers:
4
set queue workq resources_max.ncpus = 1
set queue workq
resources_max.nodect = 1
set queue workq resources_available.nodect =
1
set server resources_available.ncpus = 1
set server
resources_available.nodect = 1
set server resources_available.nodes =
1
set server resources_max.ncpus = 1
set server
resources_max.nodes = 1
set server scheduler_iteration = 60
set
server log_events = 64
Shutting down MAUI Scheduler:
vr
[ OK ]
Starting MAUI
Scheduler:
[ OK ]
--> About to run
/opt/oscar/packages/switcher/scripts/post_install for
switcher
Setting default for tag mpi ("lam-7.0.6")
Attribute
successfully set; new attribute setting will be effective for
future
shells
[EMAIL PROTECTED]'s
password:
building file list ... done
switcher.ini
mkstemp
/opt/env-switcher/etc/.switcher.ini.vs8c6H failed: No such file or
directory
wrote 237 bytes read 36 bytes 109.20
bytes/sec
total size is 188 speedup is 0.69
rsync error:
some files could not be transferred (code 23) at main.c(620)
-->
About to run /opt/oscar/packages/pfilter/scripts/post_install for
pfilter
(re)starting the pfilter firewall service on this
server
/etc/init.d/pfilter restart
Restarting
pfilter:vr
[ OK ]
pushing out the clients pfilter firewall
configuration file
. /etc/profile.d/c3.sh && cpush
/etc/pfilter.conf.clients /etc/pfilter.conf
[EMAIL PROTECTED]'s password:
Permission
denied, please try again.
[EMAIL PROTECTED]'s password:
building
file list ... done
wrote 59 bytes read 20 bytes 12.15
bytes/sec
total size is 855 speedup is 10.82
(re)starting
the pfilter firewall service on the clients
. /etc/profile.d/c3.sh
&& cexec /etc/init.d/pfilter restart
[EMAIL PROTECTED]'s
password:
************************* oscar_cluster
*************************
--------- client4---------
bash: line 1:
/etc/init.d/pfilter: No such file or directory
--> About to run
/opt/oscar/packages/opium/scripts/post_install for opium
[EMAIL PROTECTED]'s password:
building
file list ... done
switcher.ini
mkstemp
/opt/env-switcher/etc/.switcher.ini.mOMpxL failed: No such file or
directory
wrote 237 bytes read 36 bytes 78.00
bytes/sec
total size is 188 speedup is 0.69
rsync error:
some files could not be transferred (code 23) at main.c(620)
[EMAIL PROTECTED]'s password:
building
file list ... done
wrote 46 bytes read 20 bytes 26.40
bytes/sec
total size is 596 speedup is 9.03
[EMAIL PROTECTED]'s password:
building
file list ... done
passwd
wrote 81 bytes read 54 bytes
38.57 bytes/sec
total size is 2056 speedup is 15.23
[EMAIL PROTECTED]'s password:
building
file list ... done
group
wrote 80 bytes read 48 bytes
28.44 bytes/sec
total size is 720 speedup is 5.62
[EMAIL PROTECTED]'s password:
building
file list ... done
shadow
wrote 81 bytes read 48 bytes
51.60 bytes/sec
total size is 1245 speedup is 9.65
-->
About to run /opt/oscar/packages/ntpconfig/scripts/post_install for
ntpconfig
[EMAIL PROTECTED]'s
password:
************************* oscar_cluster
*************************
--------- client4---------
ntpd:
Removing firewall opening for 127.127.1.0 port 123iptables: Bad rule
(does a matching rule exist in that chain?)
[FAILED]
Shutting down
ntpd: [FAILED]
ntpd: Opening firewall for input from 127.127.1.0 port
123[ OK ]
Starting ntpd: [ OK ]
-->
About to run /opt/oscar/packages/loghost/scripts/post_install for
loghost
[EMAIL PROTECTED]'s
password:
************************* oscar_cluster
*************************
--------- client4---------
oscar_loghost
already set
--> About to run
/opt/oscar/packages/ganglia/scripts/post_install for ganglia
[EMAIL PROTECTED]'s password:
building
file list ... done
gmond.conf
wrote 85 bytes read 72
bytes 62.80 bytes/sec
total size is 3710 speedup is
23.63
Shutting down GANGLIA
gmond:
[ OK ]
Shutting down GANGLIA
gmetad:
[ OK ]
[EMAIL PROTECTED]'s
password:
************************* oscar_cluster
*************************
--------- client4---------
bash: line 1:
/etc/init.d/gmond: No such file or directory
Starting GANGLIA
gmond:
[ OK ]
[EMAIL PROTECTED]'s
password:
************************* oscar_cluster
*************************
--------- client4---------
bash: line 1:
/etc/init.d/gmond: No such file or directory
Starting GANGLIA
gmetad:
[ OK ]
--> About to run
/opt/oscar/packages/disable-services/scripts/post_install for
disable-services
POSTFIX is running
Postfix is succesfully
configured. : SERVER NODE
Shutting down
postfix:
[FAILED]
Starting
postfix:
[ OK ]
- finished configuring postfix
Cluster setup
complete!
--> Step 7: Successfully completed the cluster
install
Thanks,
YoungJun
----- Original Message -----
Sent: Tuesday, April 05, 2005
1:08 AM
Subject: RE: [Oscar-users] Error
message while testing Cluster Setup- RHEL WS version 3 (Taroon Update
3)
Hi
YoungJun:
Is /home mounted on all your
compute nodes (should be mounted off your headnode).
Also, have you done the step
'Complete Cluster Install'?
Cheers,
Bernard
From: [EMAIL PROTECTED]
on behalf of YoungJun Kim
Sent: Tue 05/04/2005 1:01
AM
To: [email protected]
Subject:
[Oscar-users] Error message while testing Cluster Setup- RHEL WS
version 3 (Taroon Update 3)
Hi all,
I tried to test cluster setup and I have
the following errors.
Preparing user tests...
Performing
user tests...
SSH ping
test
[PASSED]
SSH
server->node
[EMAIL PROTECTED].
vrlab's
password:
[EMAIL PROTECTED]'s
password:
[EMAIL PROTECTED]'s
password:
SSH
server->node
[FAILED]
SSH
node->server
[EMAIL PROTECTED].
vrlab's
password:
[EMAIL PROTECTED]'s
password:
[EMAIL PROTECTED]'s
password:
SSH
node->server
[FAILED]
PBS default queue
definition
[PASSED]
Checking for 1 free
nodes:
[FAILED]
Not enough free nodes. Tests incomplete.
Checking for 1
free
nodes:
[FAILED]
Not enough free nodes. Tests incomplete.
Checking for 1
free
nodes:
[FAILED]
Not enough free nodes. Tests incomplete.
Checking for 1
free
nodes:
[FAILED]
Not enough free nodes. Tests incomplete.
Ganglia
test
[FAILED]
There were issues running some user test scripts.
Please check your logs
...Hit <ENTER> key to
exit...
Any ideas?
Thank you,
YoungJun