Did the image deployment complete successfully and the nodes reboot to the oscar image?
Can you ssh to the compute nodes from the head node (without getting a password prompt)? On Tue, Oct 28, 2008 at 3:51 PM, ali nazemian <[EMAIL PROTECTED]> wrote: > Hi. > I want to execute clustering for our HPC center using OSCAR, but i have a > problem with step 7, installing cluster. > Here is my problem : > After i want to run step 7 , after some time on client node "tftp time out" > error appeared and node terminate the boot agent. and "Received disconnect > from 192.168.0.2: 2: The connection is closed by SSH Server > Current FSM is SSH_Main_SSHProcess" appeared on server node. > Here is the complete log of step 7: > -------------------------------------------------------------------------- > --> Update Wizard Env (as needed) > --> Step 7: Running: ./post_install > Gathering processor count from oscarnode1.clusternet. > ssh: connect to host oscarnode1.clusternet port 22: Connection timed out > Improper count (0) returned from machine oscarnode1.clusternet at > ./post_install line 83 > main::get_numproc() called at ./post_install line 39 > ssh: connect to host oscarnode1 port 22: Connection timed out > rsync: connection unexpectedly closed (0 bytes received so far) [sender] > rsync error: error in rsync protocol data stream (code 12) at io.c(359) > --> About to run /opt/oscar/packages/loghost/scripts/post_install for > loghost > ************************* oscar_cluster ************************* > --------- oscarnode1--------- > ssh: connect to host oscarnode1 port 22: Connection timed out > --> About to run /opt/oscar/packages/ganglia/scripts/post_install for > ganglia > [ganglia] Ganglia gmond configuration file modified, re-starting daemon... > Shutting down GANGLIA gmond: [60G[ [0;32mOK [0;39m ] > Starting GANGLIA gmond: [60G[ [0;32mOK [0;39m ] > editing /etc/gmetad.conf > match: gridname\s+.* > match: data_source\s+.* > [ganglia] Ganglia gmetad configuration file modified, re-starting daemon... > Shutting down GANGLIA gmetad: [60G[ [0;32mOK [0;39m ] > Starting GANGLIA gmetad: [60G[ [0;32mOK [0;39m ] > [ganglia] Starting up apache... > Stopping httpd: [60G[ [0;32mOK [0;39m ] > Starting httpd: [60G[ [0;32mOK [0;39m ] > [ganglia] Ganglia page is located at http://server.clusternet/ganglia/ > ************************* oscar_cluster ************************* > --------- oscarnode1--------- > Received disconnect from 192.168.0.2: 2: The connection is closed by SSH > Server > Current FSM is SSH_Main_SSHProcess > --> About to run /opt/oscar/packages/torque/scripts/post_install for torque > Received disconnect from 192.168.0.2: 2: The connection is closed by SSH > Server > Current FSM is SSH_Main_SSHProcess > rsync: connection unexpectedly closed (0 bytes received so far) [sender] > rsync error: error in rsync protocol data stream (code 12) at io.c(359) > TORQUE mom config file updated with clienthost: server.clusternet > Pushing config file to clients... > Sending SIGHUP to all moms... > ************************* oscar_cluster ************************* > --------- oscarnode1--------- > Received disconnect from 192.168.0.2: 2: The connection is closed by SSH > Server > Current FSM is SSH_Main_SSHProcess > [torque] Updating pbs_server nodes > /opt/pbs/bin/pbsnodes: Server has no node list > Shutting down TORQUE Server: [60G[ [0;32mOK [0;39m ] > Starting TORQUE Server: [60G[ [0;32mOK [0;39m ] > [torque] Creating TORQUE workq queue... > Max open servers: 4 > set queue workq resources_max.ncpus = 0 > set queue workq resources_max.nodect = 0 > set queue workq resources_available.nodect = 0 > set server resources_available.ncpus = 0 > set server resources_available.nodect = 0 > set server resources_available.nodes = 0 > set server resources_max.ncpus = 0 > set server resources_max.nodes = 0 > set server scheduler_iteration = 60 > set server log_events = 64 > Shutting down MAUI Scheduler: [60G[ [0;32mOK [0;39m ] > Starting MAUI Scheduler: [60G[ [0;32mOK [0;39m ] > --> About to run /opt/oscar/packages/switcher/scripts/post_install for > switcher > Setting default for tag mpi ("lam-7.1.2") > Attribute successfully set; new attribute setting will be effective for > future shells > Received disconnect from 192.168.0.2: 2: The connection is closed by SSH > Server > Current FSM is SSH_Main_SSHProcess > rsync: connection unexpectedly closed (0 bytes received so far) [sender] > rsync error: error in rsync protocol data stream (code 12) at io.c(359) > --> About to run /opt/oscar/packages/mta-config/scripts/post_install for > mta-config > ************************************ WARNING > ************************************ > OSCAR could not set up the configuration for any mailing service on the > server. > The current version of the mta-config package in OSCAR only supports the > Postfix mail transfer agent (MTA). > It looks like you have another MTA installed (e.g, sendmail or exim); as > such, > please be aware that OSCAR will not automatically configure it. > ************************************ WARNING > ************************************ > --> About to run /opt/oscar/packages/ntpconfig/scripts/post_install for > ntpconfig > Shutting down ntpd: [60G[ [0;32mOK [0;39m ] > Starting ntpd: [60G[ [0;32mOK [0;39m ] > ************************* oscar_cluster ************************* > --------- oscarnode1--------- > Received disconnect from 192.168.0.2: 2: The connection is closed by SSH > Server > Current FSM is SSH_Main_SSHProcess > --> About to run /opt/oscar/packages/opium/scripts/post_install for opium > Not all hosts were accessible by c3! Will retry the update later > Could not find template for file switcher.ini > If this contains distro-specific lines, please create a template! > image: > $VAR1 = 'oscarimage'; > --------------- > Received disconnect from 192.168.0.2: 2: The connection is closed by SSH > Server > Current FSM is SSH_Main_SSHProcess > rsync: connection unexpectedly closed (0 bytes received so far) [sender] > rsync error: error in rsync protocol data stream (code 12) at io.c(359) > Could not find template for file gshadow > If this contains distro-specific lines, please create a template! > image: > $VAR1 = 'oscarimage'; > --------------- > Received disconnect from 192.168.0.2: 2: The connection is closed by SSH > Server > Current FSM is SSH_Main_SSHProcess > rsync: connection unexpectedly closed (0 bytes received so far) [sender] > rsync error: error in rsync protocol data stream (code 12) at io.c(359) > image: > $VAR1 = 'oscarimage'; > --------------- > Received disconnect from 192.168.0.2: 2: The connection is closed by SSH > Server > Current FSM is SSH_Main_SSHProcess > rsync: connection unexpectedly closed (0 bytes received so far) [sender] > rsync error: error in rsync protocol data stream (code 12) at io.c(359) > image: > $VAR1 = 'oscarimage'; > --------------- > Received disconnect from 192.168.0.2: 2: The connection is closed by SSH > Server > Current FSM is SSH_Main_SSHProcess > rsync: connection unexpectedly closed (0 bytes received so far) [sender] > rsync error: error in rsync protocol data stream (code 12) at io.c(359) > image: > $VAR1 = 'oscarimage'; > --------------- > Received disconnect from 192.168.0.2: 2: The connection is closed by SSH > Server > Current FSM is SSH_Main_SSHProcess > rsync: connection unexpectedly closed (0 bytes received so far) [sender] > rsync error: error in rsync protocol data stream (code 12) at io.c(359) > --> About to run /opt/oscar/packages/oda/scripts/post_install for oda > generating the /etc/odaserver file on all oscar clients > . /etc/profile.d/c3.sh && cexec 'echo oscar_server > /etc/odaserver' > ************************* oscar_cluster ************************* > --------- oscarnode1--------- > Received disconnect from 192.168.0.2: 2: The connection is closed by SSH > Server > Current FSM is SSH_Main_SSHProcess > Cluster setup complete! > --> Step 7: Successfully completed the cluster install > --> Update Wizard Env (as needed) > ----------------------------------------------------------------------------------- > P.S: i am using OSCAR 5 on centos 4.7-x86_64 , i cant use centos 5.X because > of it has problem with my graphic cards. > Best regards. > > -- > A.Nazemian > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Oscar-users mailing list > Oscar-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/oscar-users > > ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-users