Hi.
I want to execute clustering for our HPC center using OSCAR, but i have a
problem with step 7, installing cluster.
Here is my problem :
After i want to run step 7 , after some time on client node "tftp time out"
error appeared and node terminate the boot agent. and "Received disconnect
from 192.168.0.2: 2: The connection is closed by SSH Server
Current FSM is SSH_Main_SSHProcess" appeared on server node.
Here is the complete log of step 7:
--------------------------------------------------------------------------
--> Update Wizard Env (as needed)
--> Step 7: Running: ./post_install
Gathering processor count from oscarnode1.clusternet.
ssh: connect to host oscarnode1.clusternet port 22: Connection timed out
Improper count (0) returned from machine oscarnode1.clusternet at
./post_install line 83
    main::get_numproc() called at ./post_install line 39
ssh: connect to host oscarnode1 port 22: Connection timed out
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(359)
--> About to run /opt/oscar/packages/loghost/scripts/post_install for
loghost
************************* oscar_cluster *************************
--------- oscarnode1---------
ssh: connect to host oscarnode1 port 22: Connection timed out
--> About to run /opt/oscar/packages/ganglia/scripts/post_install for
ganglia
[ganglia] Ganglia gmond configuration file modified, re-starting daemon...
Shutting down GANGLIA gmond: [  OK  ]
Starting GANGLIA gmond: [  OK  ]
editing /etc/gmetad.conf
match: gridname\s+.*
match: data_source\s+.*
[ganglia] Ganglia gmetad configuration file modified, re-starting daemon...
Shutting down GANGLIA gmetad: [  OK  ]
Starting GANGLIA gmetad: [  OK  ]
[ganglia] Starting up apache...
Stopping httpd: [  OK  ]
Starting httpd: [  OK  ]
[ganglia] Ganglia page is located at http://server.clusternet/ganglia/
************************* oscar_cluster *************************
--------- oscarnode1---------
Received disconnect from 192.168.0.2: 2: The connection is closed by SSH
Server
Current FSM is SSH_Main_SSHProcess
--> About to run /opt/oscar/packages/torque/scripts/post_install for torque
Received disconnect from 192.168.0.2: 2: The connection is closed by SSH
Server
Current FSM is SSH_Main_SSHProcess
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(359)
TORQUE mom config file updated with clienthost: server.clusternet
Pushing config file to clients...
Sending SIGHUP to all moms...
************************* oscar_cluster *************************
--------- oscarnode1---------
Received disconnect from 192.168.0.2: 2: The connection is closed by SSH
Server
Current FSM is SSH_Main_SSHProcess
[torque] Updating pbs_server nodes
/opt/pbs/bin/pbsnodes: Server has no node list
Shutting down TORQUE Server: [  OK  ]
Starting TORQUE Server: [  OK  ]
[torque] Creating TORQUE workq queue...
Max open servers: 4
set queue workq resources_max.ncpus = 0
set queue workq resources_max.nodect = 0
set queue workq resources_available.nodect = 0
set server resources_available.ncpus = 0
set server resources_available.nodect = 0
set server resources_available.nodes = 0
set server resources_max.ncpus = 0
set server resources_max.nodes = 0
set server scheduler_iteration = 60
set server log_events = 64
Shutting down MAUI Scheduler: [  OK  ]
Starting MAUI Scheduler: [  OK  ]
--> About to run /opt/oscar/packages/switcher/scripts/post_install for
switcher
Setting default for tag mpi ("lam-7.1.2")
Attribute successfully set; new attribute setting will be effective for
future shells
Received disconnect from 192.168.0.2: 2: The connection is closed by SSH
Server
Current FSM is SSH_Main_SSHProcess
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(359)
--> About to run /opt/oscar/packages/mta-config/scripts/post_install for
mta-config
************************************ WARNING
************************************
OSCAR could not set up the configuration for any mailing service on the
server.
The current version of the mta-config package in OSCAR only supports the
Postfix mail transfer agent (MTA).
It looks like you have another MTA installed (e.g, sendmail or exim); as
such,
please be aware that OSCAR will not automatically configure it.
************************************ WARNING
************************************
--> About to run /opt/oscar/packages/ntpconfig/scripts/post_install for
ntpconfig
Shutting down ntpd: [  OK  ]
Starting ntpd: [  OK  ]
************************* oscar_cluster *************************
--------- oscarnode1---------
Received disconnect from 192.168.0.2: 2: The connection is closed by SSH
Server
Current FSM is SSH_Main_SSHProcess
--> About to run /opt/oscar/packages/opium/scripts/post_install for opium
Not all hosts were accessible by c3! Will retry the update later
Could not find template for file switcher.ini
If this contains distro-specific lines, please create a template!
image:
$VAR1 = 'oscarimage';
---------------
Received disconnect from 192.168.0.2: 2: The connection is closed by SSH
Server
Current FSM is SSH_Main_SSHProcess
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(359)
Could not find template for file gshadow
If this contains distro-specific lines, please create a template!
image:
$VAR1 = 'oscarimage';
---------------
Received disconnect from 192.168.0.2: 2: The connection is closed by SSH
Server
Current FSM is SSH_Main_SSHProcess
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(359)
image:
$VAR1 = 'oscarimage';
---------------
Received disconnect from 192.168.0.2: 2: The connection is closed by SSH
Server
Current FSM is SSH_Main_SSHProcess
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(359)
image:
$VAR1 = 'oscarimage';
---------------
Received disconnect from 192.168.0.2: 2: The connection is closed by SSH
Server
Current FSM is SSH_Main_SSHProcess
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(359)
image:
$VAR1 = 'oscarimage';
---------------
Received disconnect from 192.168.0.2: 2: The connection is closed by SSH
Server
Current FSM is SSH_Main_SSHProcess
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(359)
--> About to run /opt/oscar/packages/oda/scripts/post_install for oda
generating the /etc/odaserver file on all oscar clients
. /etc/profile.d/c3.sh && cexec 'echo oscar_server > /etc/odaserver'
************************* oscar_cluster *************************
--------- oscarnode1---------
Received disconnect from 192.168.0.2: 2: The connection is closed by SSH
Server
Current FSM is SSH_Main_SSHProcess
Cluster setup complete!
--> Step 7: Successfully completed the cluster install
--> Update Wizard Env (as needed)
-----------------------------------------------------------------------------------
P.S: i am using OSCAR 5 on centos 4.7-x86_64 , i cant use centos 5.X because
of it has problem with my graphic cards.
Best regards.

-- 
A.Nazemian
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to