Did the image deployment complete successfully and the nodes reboot to
the oscar image?

Can you ssh to the compute nodes from the head node (without getting a
password prompt)?

On Tue, Oct 28, 2008 at 3:51 PM, ali nazemian <[EMAIL PROTECTED]> wrote:
> Hi.
> I want to execute clustering for our HPC center using OSCAR, but i have a
> problem with step 7, installing cluster.
> Here is my problem :
> After i want to run step 7 , after some time on client node "tftp time out"
> error appeared and node terminate the boot agent. and "Received disconnect
> from 192.168.0.2: 2: The connection is closed by SSH Server
> Current FSM is SSH_Main_SSHProcess" appeared on server node.
> Here is the complete log of step 7:
> --------------------------------------------------------------------------
> --> Update Wizard Env (as needed)
> --> Step 7: Running: ./post_install
> Gathering processor count from oscarnode1.clusternet.
> ssh: connect to host oscarnode1.clusternet port 22: Connection timed out
> Improper count (0) returned from machine oscarnode1.clusternet at
> ./post_install line 83
>     main::get_numproc() called at ./post_install line 39
> ssh: connect to host oscarnode1 port 22: Connection timed out
> rsync: connection unexpectedly closed (0 bytes received so far) [sender]
> rsync error: error in rsync protocol data stream (code 12) at io.c(359)
> --> About to run /opt/oscar/packages/loghost/scripts/post_install for
> loghost
> ************************* oscar_cluster *************************
> --------- oscarnode1---------
> ssh: connect to host oscarnode1 port 22: Connection timed out
> --> About to run /opt/oscar/packages/ganglia/scripts/post_install for
> ganglia
> [ganglia] Ganglia gmond configuration file modified, re-starting daemon...
> Shutting down GANGLIA gmond: [60G[  [0;32mOK [0;39m  ]
> Starting GANGLIA gmond: [60G[  [0;32mOK [0;39m  ]
> editing /etc/gmetad.conf
> match: gridname\s+.*
> match: data_source\s+.*
> [ganglia] Ganglia gmetad configuration file modified, re-starting daemon...
> Shutting down GANGLIA gmetad: [60G[  [0;32mOK [0;39m  ]
> Starting GANGLIA gmetad: [60G[  [0;32mOK [0;39m  ]
> [ganglia] Starting up apache...
> Stopping httpd: [60G[  [0;32mOK [0;39m  ]
> Starting httpd: [60G[  [0;32mOK [0;39m  ]
> [ganglia] Ganglia page is located at http://server.clusternet/ganglia/
> ************************* oscar_cluster *************************
> --------- oscarnode1---------
> Received disconnect from 192.168.0.2: 2: The connection is closed by SSH
> Server
> Current FSM is SSH_Main_SSHProcess
> --> About to run /opt/oscar/packages/torque/scripts/post_install for torque
> Received disconnect from 192.168.0.2: 2: The connection is closed by SSH
> Server
> Current FSM is SSH_Main_SSHProcess
> rsync: connection unexpectedly closed (0 bytes received so far) [sender]
> rsync error: error in rsync protocol data stream (code 12) at io.c(359)
> TORQUE mom config file updated with clienthost: server.clusternet
> Pushing config file to clients...
> Sending SIGHUP to all moms...
> ************************* oscar_cluster *************************
> --------- oscarnode1---------
> Received disconnect from 192.168.0.2: 2: The connection is closed by SSH
> Server
> Current FSM is SSH_Main_SSHProcess
> [torque] Updating pbs_server nodes
> /opt/pbs/bin/pbsnodes: Server has no node list
> Shutting down TORQUE Server: [60G[  [0;32mOK [0;39m  ]
> Starting TORQUE Server: [60G[  [0;32mOK [0;39m  ]
> [torque] Creating TORQUE workq queue...
> Max open servers: 4
> set queue workq resources_max.ncpus = 0
> set queue workq resources_max.nodect = 0
> set queue workq resources_available.nodect = 0
> set server resources_available.ncpus = 0
> set server resources_available.nodect = 0
> set server resources_available.nodes = 0
> set server resources_max.ncpus = 0
> set server resources_max.nodes = 0
> set server scheduler_iteration = 60
> set server log_events = 64
> Shutting down MAUI Scheduler: [60G[  [0;32mOK [0;39m  ]
> Starting MAUI Scheduler: [60G[  [0;32mOK [0;39m  ]
> --> About to run /opt/oscar/packages/switcher/scripts/post_install for
> switcher
> Setting default for tag mpi ("lam-7.1.2")
> Attribute successfully set; new attribute setting will be effective for
> future shells
> Received disconnect from 192.168.0.2: 2: The connection is closed by SSH
> Server
> Current FSM is SSH_Main_SSHProcess
> rsync: connection unexpectedly closed (0 bytes received so far) [sender]
> rsync error: error in rsync protocol data stream (code 12) at io.c(359)
> --> About to run /opt/oscar/packages/mta-config/scripts/post_install for
> mta-config
> ************************************ WARNING
> ************************************
> OSCAR could not set up the configuration for any mailing service on the
> server.
> The current version of the mta-config package in OSCAR only supports the
> Postfix mail transfer agent (MTA).
> It looks like you have another MTA installed (e.g, sendmail or exim); as
> such,
> please be aware that OSCAR will not automatically configure it.
> ************************************ WARNING
> ************************************
> --> About to run /opt/oscar/packages/ntpconfig/scripts/post_install for
> ntpconfig
> Shutting down ntpd: [60G[  [0;32mOK [0;39m  ]
> Starting ntpd: [60G[  [0;32mOK [0;39m  ]
> ************************* oscar_cluster *************************
> --------- oscarnode1---------
> Received disconnect from 192.168.0.2: 2: The connection is closed by SSH
> Server
> Current FSM is SSH_Main_SSHProcess
> --> About to run /opt/oscar/packages/opium/scripts/post_install for opium
> Not all hosts were accessible by c3! Will retry the update later
> Could not find template for file switcher.ini
> If this contains distro-specific lines, please create a template!
> image:
> $VAR1 = 'oscarimage';
> ---------------
> Received disconnect from 192.168.0.2: 2: The connection is closed by SSH
> Server
> Current FSM is SSH_Main_SSHProcess
> rsync: connection unexpectedly closed (0 bytes received so far) [sender]
> rsync error: error in rsync protocol data stream (code 12) at io.c(359)
> Could not find template for file gshadow
> If this contains distro-specific lines, please create a template!
> image:
> $VAR1 = 'oscarimage';
> ---------------
> Received disconnect from 192.168.0.2: 2: The connection is closed by SSH
> Server
> Current FSM is SSH_Main_SSHProcess
> rsync: connection unexpectedly closed (0 bytes received so far) [sender]
> rsync error: error in rsync protocol data stream (code 12) at io.c(359)
> image:
> $VAR1 = 'oscarimage';
> ---------------
> Received disconnect from 192.168.0.2: 2: The connection is closed by SSH
> Server
> Current FSM is SSH_Main_SSHProcess
> rsync: connection unexpectedly closed (0 bytes received so far) [sender]
> rsync error: error in rsync protocol data stream (code 12) at io.c(359)
> image:
> $VAR1 = 'oscarimage';
> ---------------
> Received disconnect from 192.168.0.2: 2: The connection is closed by SSH
> Server
> Current FSM is SSH_Main_SSHProcess
> rsync: connection unexpectedly closed (0 bytes received so far) [sender]
> rsync error: error in rsync protocol data stream (code 12) at io.c(359)
> image:
> $VAR1 = 'oscarimage';
> ---------------
> Received disconnect from 192.168.0.2: 2: The connection is closed by SSH
> Server
> Current FSM is SSH_Main_SSHProcess
> rsync: connection unexpectedly closed (0 bytes received so far) [sender]
> rsync error: error in rsync protocol data stream (code 12) at io.c(359)
> --> About to run /opt/oscar/packages/oda/scripts/post_install for oda
> generating the /etc/odaserver file on all oscar clients
> . /etc/profile.d/c3.sh && cexec 'echo oscar_server > /etc/odaserver'
> ************************* oscar_cluster *************************
> --------- oscarnode1---------
> Received disconnect from 192.168.0.2: 2: The connection is closed by SSH
> Server
> Current FSM is SSH_Main_SSHProcess
> Cluster setup complete!
> --> Step 7: Successfully completed the cluster install
> --> Update Wizard Env (as needed)
> -----------------------------------------------------------------------------------
> P.S: i am using OSCAR 5 on centos 4.7-x86_64 , i cant use centos 5.X because
> of it has problem with my graphic cards.
> Best regards.
>
> --
> A.Nazemian
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great
> prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Oscar-users mailing list
> Oscar-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/oscar-users
>
>

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to