It looks like the nodes didn't image, or are now unable to communicate with the head node in any event.
Plug in a monitor and keyboard to your compute node and see if you have a login prompt. You should be able to log in as your root user from the head node. If it allows you to log in there (you will need the password) try running "ssh 192.168.0.1" It is quite possible that a switch or other device on your network is using the 192.168.0.1 network. I generally use the 10.0.0.1 network because of this. On Tue, Oct 28, 2008 at 4:18 PM, ali nazemian <[EMAIL PROTECTED]> wrote: > For your first question,it seems that all of steps before step 7 > successfully completed . > and about your second one , i dont know how to check that. > I think maybe its hardware problem for my switch , its 3com 24 port switch , > can it be my problem?! > > On Tue, Oct 28, 2008 at 11:30 PM, Michael Edwards <[EMAIL PROTECTED]> > wrote: >> >> Did the image deployment complete successfully and the nodes reboot to >> the oscar image? >> >> Can you ssh to the compute nodes from the head node (without getting a >> password prompt)? >> >> On Tue, Oct 28, 2008 at 3:51 PM, ali nazemian <[EMAIL PROTECTED]> >> wrote: >> > Hi. >> > I want to execute clustering for our HPC center using OSCAR, but i have >> > a >> > problem with step 7, installing cluster. >> > Here is my problem : >> > After i want to run step 7 , after some time on client node "tftp time >> > out" >> > error appeared and node terminate the boot agent. and "Received >> > disconnect >> > from 192.168.0.2: 2: The connection is closed by SSH Server >> > Current FSM is SSH_Main_SSHProcess" appeared on server node. >> > Here is the complete log of step 7: >> > >> > -------------------------------------------------------------------------- >> > --> Update Wizard Env (as needed) >> > --> Step 7: Running: ./post_install >> > Gathering processor count from oscarnode1.clusternet. >> > ssh: connect to host oscarnode1.clusternet port 22: Connection timed out >> > Improper count (0) returned from machine oscarnode1.clusternet at >> > ./post_install line 83 >> > main::get_numproc() called at ./post_install line 39 >> > ssh: connect to host oscarnode1 port 22: Connection timed out >> > rsync: connection unexpectedly closed (0 bytes received so far) [sender] >> > rsync error: error in rsync protocol data stream (code 12) at io.c(359) >> > --> About to run /opt/oscar/packages/loghost/scripts/post_install for >> > loghost >> > ************************* oscar_cluster ************************* >> > --------- oscarnode1--------- >> > ssh: connect to host oscarnode1 port 22: Connection timed out >> > --> About to run /opt/oscar/packages/ganglia/scripts/post_install for >> > ganglia >> > [ganglia] Ganglia gmond configuration file modified, re-starting >> > daemon... >> > Shutting down GANGLIA gmond: [60G[ [0;32mOK [0;39m ] >> > Starting GANGLIA gmond: [60G[ [0;32mOK [0;39m ] >> > editing /etc/gmetad.conf >> > match: gridname\s+.* >> > match: data_source\s+.* >> > [ganglia] Ganglia gmetad configuration file modified, re-starting >> > daemon... >> > Shutting down GANGLIA gmetad: [60G[ [0;32mOK [0;39m ] >> > Starting GANGLIA gmetad: [60G[ [0;32mOK [0;39m ] >> > [ganglia] Starting up apache... >> > Stopping httpd: [60G[ [0;32mOK [0;39m ] >> > Starting httpd: [60G[ [0;32mOK [0;39m ] >> > [ganglia] Ganglia page is located at http://server.clusternet/ganglia/ >> > ************************* oscar_cluster ************************* >> > --------- oscarnode1--------- >> > Received disconnect from 192.168.0.2: 2: The connection is closed by SSH >> > Server >> > Current FSM is SSH_Main_SSHProcess >> > --> About to run /opt/oscar/packages/torque/scripts/post_install for >> > torque >> > Received disconnect from 192.168.0.2: 2: The connection is closed by SSH >> > Server >> > Current FSM is SSH_Main_SSHProcess >> > rsync: connection unexpectedly closed (0 bytes received so far) [sender] >> > rsync error: error in rsync protocol data stream (code 12) at io.c(359) >> > TORQUE mom config file updated with clienthost: server.clusternet >> > Pushing config file to clients... >> > Sending SIGHUP to all moms... >> > ************************* oscar_cluster ************************* >> > --------- oscarnode1--------- >> > Received disconnect from 192.168.0.2: 2: The connection is closed by SSH >> > Server >> > Current FSM is SSH_Main_SSHProcess >> > [torque] Updating pbs_server nodes >> > /opt/pbs/bin/pbsnodes: Server has no node list >> > Shutting down TORQUE Server: [60G[ [0;32mOK [0;39m ] >> > Starting TORQUE Server: [60G[ [0;32mOK [0;39m ] >> > [torque] Creating TORQUE workq queue... >> > Max open servers: 4 >> > set queue workq resources_max.ncpus = 0 >> > set queue workq resources_max.nodect = 0 >> > set queue workq resources_available.nodect = 0 >> > set server resources_available.ncpus = 0 >> > set server resources_available.nodect = 0 >> > set server resources_available.nodes = 0 >> > set server resources_max.ncpus = 0 >> > set server resources_max.nodes = 0 >> > set server scheduler_iteration = 60 >> > set server log_events = 64 >> > Shutting down MAUI Scheduler: [60G[ [0;32mOK [0;39m ] >> > Starting MAUI Scheduler: [60G[ [0;32mOK [0;39m ] >> > --> About to run /opt/oscar/packages/switcher/scripts/post_install for >> > switcher >> > Setting default for tag mpi ("lam-7.1.2") >> > Attribute successfully set; new attribute setting will be effective for >> > future shells >> > Received disconnect from 192.168.0.2: 2: The connection is closed by SSH >> > Server >> > Current FSM is SSH_Main_SSHProcess >> > rsync: connection unexpectedly closed (0 bytes received so far) [sender] >> > rsync error: error in rsync protocol data stream (code 12) at io.c(359) >> > --> About to run /opt/oscar/packages/mta-config/scripts/post_install for >> > mta-config >> > ************************************ WARNING >> > ************************************ >> > OSCAR could not set up the configuration for any mailing service on the >> > server. >> > The current version of the mta-config package in OSCAR only supports the >> > Postfix mail transfer agent (MTA). >> > It looks like you have another MTA installed (e.g, sendmail or exim); as >> > such, >> > please be aware that OSCAR will not automatically configure it. >> > ************************************ WARNING >> > ************************************ >> > --> About to run /opt/oscar/packages/ntpconfig/scripts/post_install for >> > ntpconfig >> > Shutting down ntpd: [60G[ [0;32mOK [0;39m ] >> > Starting ntpd: [60G[ [0;32mOK [0;39m ] >> > ************************* oscar_cluster ************************* >> > --------- oscarnode1--------- >> > Received disconnect from 192.168.0.2: 2: The connection is closed by SSH >> > Server >> > Current FSM is SSH_Main_SSHProcess >> > --> About to run /opt/oscar/packages/opium/scripts/post_install for >> > opium >> > Not all hosts were accessible by c3! Will retry the update later >> > Could not find template for file switcher.ini >> > If this contains distro-specific lines, please create a template! >> > image: >> > $VAR1 = 'oscarimage'; >> > --------------- >> > Received disconnect from 192.168.0.2: 2: The connection is closed by SSH >> > Server >> > Current FSM is SSH_Main_SSHProcess >> > rsync: connection unexpectedly closed (0 bytes received so far) [sender] >> > rsync error: error in rsync protocol data stream (code 12) at io.c(359) >> > Could not find template for file gshadow >> > If this contains distro-specific lines, please create a template! >> > image: >> > $VAR1 = 'oscarimage'; >> > --------------- >> > Received disconnect from 192.168.0.2: 2: The connection is closed by SSH >> > Server >> > Current FSM is SSH_Main_SSHProcess >> > rsync: connection unexpectedly closed (0 bytes received so far) [sender] >> > rsync error: error in rsync protocol data stream (code 12) at io.c(359) >> > image: >> > $VAR1 = 'oscarimage'; >> > --------------- >> > Received disconnect from 192.168.0.2: 2: The connection is closed by SSH >> > Server >> > Current FSM is SSH_Main_SSHProcess >> > rsync: connection unexpectedly closed (0 bytes received so far) [sender] >> > rsync error: error in rsync protocol data stream (code 12) at io.c(359) >> > image: >> > $VAR1 = 'oscarimage'; >> > --------------- >> > Received disconnect from 192.168.0.2: 2: The connection is closed by SSH >> > Server >> > Current FSM is SSH_Main_SSHProcess >> > rsync: connection unexpectedly closed (0 bytes received so far) [sender] >> > rsync error: error in rsync protocol data stream (code 12) at io.c(359) >> > image: >> > $VAR1 = 'oscarimage'; >> > --------------- >> > Received disconnect from 192.168.0.2: 2: The connection is closed by SSH >> > Server >> > Current FSM is SSH_Main_SSHProcess >> > rsync: connection unexpectedly closed (0 bytes received so far) [sender] >> > rsync error: error in rsync protocol data stream (code 12) at io.c(359) >> > --> About to run /opt/oscar/packages/oda/scripts/post_install for oda >> > generating the /etc/odaserver file on all oscar clients >> > . /etc/profile.d/c3.sh && cexec 'echo oscar_server > /etc/odaserver' >> > ************************* oscar_cluster ************************* >> > --------- oscarnode1--------- >> > Received disconnect from 192.168.0.2: 2: The connection is closed by SSH >> > Server >> > Current FSM is SSH_Main_SSHProcess >> > Cluster setup complete! >> > --> Step 7: Successfully completed the cluster install >> > --> Update Wizard Env (as needed) >> > >> > ----------------------------------------------------------------------------------- >> > P.S: i am using OSCAR 5 on centos 4.7-x86_64 , i cant use centos 5.X >> > because >> > of it has problem with my graphic cards. >> > Best regards. >> > >> > -- >> > A.Nazemian >> > >> > >> > ------------------------------------------------------------------------- >> > This SF.Net email is sponsored by the Moblin Your Move Developer's >> > challenge >> > Build the coolest Linux based applications with Moblin SDK & win great >> > prizes >> > Grand prize is a trip for two to an Open Source event anywhere in the >> > world >> > http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> > _______________________________________________ >> > Oscar-users mailing list >> > Oscar-users@lists.sourceforge.net >> > https://lists.sourceforge.net/lists/listinfo/oscar-users >> > >> > >> >> ------------------------------------------------------------------------- >> This SF.Net email is sponsored by the Moblin Your Move Developer's >> challenge >> Build the coolest Linux based applications with Moblin SDK & win great >> prizes >> Grand prize is a trip for two to an Open Source event anywhere in the >> world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> Oscar-users mailing list >> Oscar-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/oscar-users > > > > -- > A.Nazemian > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Oscar-users mailing list > Oscar-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/oscar-users > > ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-users