It sounds like the nodes did not image properly. There are many many possible problems, and I am not entirely sure I understand what you have done so I am going to ask many questions.
What hardware are you using? Specifically interested in OS, network cards, and cpu types. 192.168.0.1 is a very common default address for networking equipment. Can you ssh to 192.168.0.1 from another computer on the cluster switch (move one to the switch if the nodes arent booting). Anyway, let me see if I understand what you have done. You have created your image on the head node. Then you have run the "Setup Networking" Step, and successfully collected the MAC addresses of the nodes? or did you enter them in from a file? There is a button on that screen called "Setup Network Boot" which is necessary, did you click on that? If not then the nodes will not image properly. Then did you network boot the nodes again, with a monitor attached and see what happened? Do you have the firewall on the head node turned off? SE Linux off? On Wed, Oct 29, 2008 at 5:11 AM, ali nazemian <[EMAIL PROTECTED]> wrote: > Hi again. > Let me explain about my problem more: > here is the result on client node: > client mac addr: XX XX XX XX XX XX ... > client ip: 192.168.0.2 mask: 255.255.255.0 dhcp ip: 192.168.0.1 > gateway ip: 192.168.0.1 > pxe-e32: tftp open timeout. > ... > and same time on the server node: > ssh:connect to host oscarnode1.clusternet port 22: connection time out > i use "nmap -a" to see which port is open and which is not, here is the > result: > starting nmap 3.70 (http://...) at 2008-10-29 14:12 IRST > no target machines/network specified! > quitting! > i couldnt use "ssh 192.168.0.1" on the client node , cause of i havent any > command environment there to type any command , so i use "ssh 192.168.0.2" > on server and the result was: ssh: connect to host 192.168.0.2 port 22: no > route to host > i used cd boot instead of network boot , same result appeared: > connect to host 192.168.0.2 port 22: no route to host > > It seems i have a problem with my network not OSCAR, what do u think?! > > On Wed, Oct 29, 2008 at 1:36 AM, ali nazemian <[EMAIL PROTECTED]> wrote: >> >> I use 192.168.x.x just for test , and as a pilot this network didnt >> connect to other networks so in this case 192.168.x.x shouldnt be a problem >> , however i will use 10.0.x.x for final clustering network. >> Now i haven't access to those nodes , so i'll test ssh tomorrow and let >> you know what result i get. >> >> On Wed, Oct 29, 2008 at 12:03 AM, Michael Edwards <[EMAIL PROTECTED]> >> wrote: >>> >>> It looks like the nodes didn't image, or are now unable to communicate >>> with the head node in any event. >>> >>> Plug in a monitor and keyboard to your compute node and see if you >>> have a login prompt. You should be able to log in as your root user >>> from the head node. If it allows you to log in there (you will need >>> the password) try running "ssh 192.168.0.1" >>> >>> It is quite possible that a switch or other device on your network is >>> using the 192.168.0.1 network. I generally use the 10.0.0.1 network >>> because of this. >>> >>> On Tue, Oct 28, 2008 at 4:18 PM, ali nazemian <[EMAIL PROTECTED]> >>> wrote: >>> > For your first question,it seems that all of steps before step 7 >>> > successfully completed . >>> > and about your second one , i dont know how to check that. >>> > I think maybe its hardware problem for my switch , its 3com 24 port >>> > switch , >>> > can it be my problem?! >>> > >>> > On Tue, Oct 28, 2008 at 11:30 PM, Michael Edwards <[EMAIL PROTECTED]> >>> > wrote: >>> >> >>> >> Did the image deployment complete successfully and the nodes reboot to >>> >> the oscar image? >>> >> >>> >> Can you ssh to the compute nodes from the head node (without getting a >>> >> password prompt)? >>> >> >>> >> On Tue, Oct 28, 2008 at 3:51 PM, ali nazemian <[EMAIL PROTECTED]> >>> >> wrote: >>> >> > Hi. >>> >> > I want to execute clustering for our HPC center using OSCAR, but i >>> >> > have >>> >> > a >>> >> > problem with step 7, installing cluster. >>> >> > Here is my problem : >>> >> > After i want to run step 7 , after some time on client node "tftp >>> >> > time >>> >> > out" >>> >> > error appeared and node terminate the boot agent. and "Received >>> >> > disconnect >>> >> > from 192.168.0.2: 2: The connection is closed by SSH Server >>> >> > Current FSM is SSH_Main_SSHProcess" appeared on server node. >>> >> > Here is the complete log of step 7: >>> >> > >>> >> > >>> >> > -------------------------------------------------------------------------- >>> >> > --> Update Wizard Env (as needed) >>> >> > --> Step 7: Running: ./post_install >>> >> > Gathering processor count from oscarnode1.clusternet. >>> >> > ssh: connect to host oscarnode1.clusternet port 22: Connection timed >>> >> > out >>> >> > Improper count (0) returned from machine oscarnode1.clusternet at >>> >> > ./post_install line 83 >>> >> > main::get_numproc() called at ./post_install line 39 >>> >> > ssh: connect to host oscarnode1 port 22: Connection timed out >>> >> > rsync: connection unexpectedly closed (0 bytes received so far) >>> >> > [sender] >>> >> > rsync error: error in rsync protocol data stream (code 12) at >>> >> > io.c(359) >>> >> > --> About to run /opt/oscar/packages/loghost/scripts/post_install >>> >> > for >>> >> > loghost >>> >> > ************************* oscar_cluster ************************* >>> >> > --------- oscarnode1--------- >>> >> > ssh: connect to host oscarnode1 port 22: Connection timed out >>> >> > --> About to run /opt/oscar/packages/ganglia/scripts/post_install >>> >> > for >>> >> > ganglia >>> >> > [ganglia] Ganglia gmond configuration file modified, re-starting >>> >> > daemon... >>> >> > Shutting down GANGLIA gmond: [60G[ [0;32mOK [0;39m ] >>> >> > Starting GANGLIA gmond: [60G[ [0;32mOK [0;39m ] >>> >> > editing /etc/gmetad.conf >>> >> > match: gridname\s+.* >>> >> > match: data_source\s+.* >>> >> > [ganglia] Ganglia gmetad configuration file modified, re-starting >>> >> > daemon... >>> >> > Shutting down GANGLIA gmetad: [60G[ [0;32mOK [0;39m ] >>> >> > Starting GANGLIA gmetad: [60G[ [0;32mOK [0;39m ] >>> >> > [ganglia] Starting up apache... >>> >> > Stopping httpd: [60G[ [0;32mOK [0;39m ] >>> >> > Starting httpd: [60G[ [0;32mOK [0;39m ] >>> >> > [ganglia] Ganglia page is located at >>> >> > http://server.clusternet/ganglia/ >>> >> > ************************* oscar_cluster ************************* >>> >> > --------- oscarnode1--------- >>> >> > Received disconnect from 192.168.0.2: 2: The connection is closed by >>> >> > SSH >>> >> > Server >>> >> > Current FSM is SSH_Main_SSHProcess >>> >> > --> About to run /opt/oscar/packages/torque/scripts/post_install for >>> >> > torque >>> >> > Received disconnect from 192.168.0.2: 2: The connection is closed by >>> >> > SSH >>> >> > Server >>> >> > Current FSM is SSH_Main_SSHProcess >>> >> > rsync: connection unexpectedly closed (0 bytes received so far) >>> >> > [sender] >>> >> > rsync error: error in rsync protocol data stream (code 12) at >>> >> > io.c(359) >>> >> > TORQUE mom config file updated with clienthost: server.clusternet >>> >> > Pushing config file to clients... >>> >> > Sending SIGHUP to all moms... >>> >> > ************************* oscar_cluster ************************* >>> >> > --------- oscarnode1--------- >>> >> > Received disconnect from 192.168.0.2: 2: The connection is closed by >>> >> > SSH >>> >> > Server >>> >> > Current FSM is SSH_Main_SSHProcess >>> >> > [torque] Updating pbs_server nodes >>> >> > /opt/pbs/bin/pbsnodes: Server has no node list >>> >> > Shutting down TORQUE Server: [60G[ [0;32mOK [0;39m ] >>> >> > Starting TORQUE Server: [60G[ [0;32mOK [0;39m ] >>> >> > [torque] Creating TORQUE workq queue... >>> >> > Max open servers: 4 >>> >> > set queue workq resources_max.ncpus = 0 >>> >> > set queue workq resources_max.nodect = 0 >>> >> > set queue workq resources_available.nodect = 0 >>> >> > set server resources_available.ncpus = 0 >>> >> > set server resources_available.nodect = 0 >>> >> > set server resources_available.nodes = 0 >>> >> > set server resources_max.ncpus = 0 >>> >> > set server resources_max.nodes = 0 >>> >> > set server scheduler_iteration = 60 >>> >> > set server log_events = 64 >>> >> > Shutting down MAUI Scheduler: [60G[ [0;32mOK [0;39m ] >>> >> > Starting MAUI Scheduler: [60G[ [0;32mOK [0;39m ] >>> >> > --> About to run /opt/oscar/packages/switcher/scripts/post_install >>> >> > for >>> >> > switcher >>> >> > Setting default for tag mpi ("lam-7.1.2") >>> >> > Attribute successfully set; new attribute setting will be effective >>> >> > for >>> >> > future shells >>> >> > Received disconnect from 192.168.0.2: 2: The connection is closed by >>> >> > SSH >>> >> > Server >>> >> > Current FSM is SSH_Main_SSHProcess >>> >> > rsync: connection unexpectedly closed (0 bytes received so far) >>> >> > [sender] >>> >> > rsync error: error in rsync protocol data stream (code 12) at >>> >> > io.c(359) >>> >> > --> About to run /opt/oscar/packages/mta-config/scripts/post_install >>> >> > for >>> >> > mta-config >>> >> > ************************************ WARNING >>> >> > ************************************ >>> >> > OSCAR could not set up the configuration for any mailing service on >>> >> > the >>> >> > server. >>> >> > The current version of the mta-config package in OSCAR only supports >>> >> > the >>> >> > Postfix mail transfer agent (MTA). >>> >> > It looks like you have another MTA installed (e.g, sendmail or >>> >> > exim); as >>> >> > such, >>> >> > please be aware that OSCAR will not automatically configure it. >>> >> > ************************************ WARNING >>> >> > ************************************ >>> >> > --> About to run /opt/oscar/packages/ntpconfig/scripts/post_install >>> >> > for >>> >> > ntpconfig >>> >> > Shutting down ntpd: [60G[ [0;32mOK [0;39m ] >>> >> > Starting ntpd: [60G[ [0;32mOK [0;39m ] >>> >> > ************************* oscar_cluster ************************* >>> >> > --------- oscarnode1--------- >>> >> > Received disconnect from 192.168.0.2: 2: The connection is closed by >>> >> > SSH >>> >> > Server >>> >> > Current FSM is SSH_Main_SSHProcess >>> >> > --> About to run /opt/oscar/packages/opium/scripts/post_install for >>> >> > opium >>> >> > Not all hosts were accessible by c3! Will retry the update later >>> >> > Could not find template for file switcher.ini >>> >> > If this contains distro-specific lines, please create a template! >>> >> > image: >>> >> > $VAR1 = 'oscarimage'; >>> >> > --------------- >>> >> > Received disconnect from 192.168.0.2: 2: The connection is closed by >>> >> > SSH >>> >> > Server >>> >> > Current FSM is SSH_Main_SSHProcess >>> >> > rsync: connection unexpectedly closed (0 bytes received so far) >>> >> > [sender] >>> >> > rsync error: error in rsync protocol data stream (code 12) at >>> >> > io.c(359) >>> >> > Could not find template for file gshadow >>> >> > If this contains distro-specific lines, please create a template! >>> >> > image: >>> >> > $VAR1 = 'oscarimage'; >>> >> > --------------- >>> >> > Received disconnect from 192.168.0.2: 2: The connection is closed by >>> >> > SSH >>> >> > Server >>> >> > Current FSM is SSH_Main_SSHProcess >>> >> > rsync: connection unexpectedly closed (0 bytes received so far) >>> >> > [sender] >>> >> > rsync error: error in rsync protocol data stream (code 12) at >>> >> > io.c(359) >>> >> > image: >>> >> > $VAR1 = 'oscarimage'; >>> >> > --------------- >>> >> > Received disconnect from 192.168.0.2: 2: The connection is closed by >>> >> > SSH >>> >> > Server >>> >> > Current FSM is SSH_Main_SSHProcess >>> >> > rsync: connection unexpectedly closed (0 bytes received so far) >>> >> > [sender] >>> >> > rsync error: error in rsync protocol data stream (code 12) at >>> >> > io.c(359) >>> >> > image: >>> >> > $VAR1 = 'oscarimage'; >>> >> > --------------- >>> >> > Received disconnect from 192.168.0.2: 2: The connection is closed by >>> >> > SSH >>> >> > Server >>> >> > Current FSM is SSH_Main_SSHProcess >>> >> > rsync: connection unexpectedly closed (0 bytes received so far) >>> >> > [sender] >>> >> > rsync error: error in rsync protocol data stream (code 12) at >>> >> > io.c(359) >>> >> > image: >>> >> > $VAR1 = 'oscarimage'; >>> >> > --------------- >>> >> > Received disconnect from 192.168.0.2: 2: The connection is closed by >>> >> > SSH >>> >> > Server >>> >> > Current FSM is SSH_Main_SSHProcess >>> >> > rsync: connection unexpectedly closed (0 bytes received so far) >>> >> > [sender] >>> >> > rsync error: error in rsync protocol data stream (code 12) at >>> >> > io.c(359) >>> >> > --> About to run /opt/oscar/packages/oda/scripts/post_install for >>> >> > oda >>> >> > generating the /etc/odaserver file on all oscar clients >>> >> > . /etc/profile.d/c3.sh && cexec 'echo oscar_server > /etc/odaserver' >>> >> > ************************* oscar_cluster ************************* >>> >> > --------- oscarnode1--------- >>> >> > Received disconnect from 192.168.0.2: 2: The connection is closed by >>> >> > SSH >>> >> > Server >>> >> > Current FSM is SSH_Main_SSHProcess >>> >> > Cluster setup complete! >>> >> > --> Step 7: Successfully completed the cluster install >>> >> > --> Update Wizard Env (as needed) >>> >> > >>> >> > >>> >> > ----------------------------------------------------------------------------------- >>> >> > P.S: i am using OSCAR 5 on centos 4.7-x86_64 , i cant use centos 5.X >>> >> > because >>> >> > of it has problem with my graphic cards. >>> >> > Best regards. >>> >> > >>> >> > -- >>> >> > A.Nazemian >>> >> > >>> >> > >>> >> > >>> >> > ------------------------------------------------------------------------- >>> >> > This SF.Net email is sponsored by the Moblin Your Move Developer's >>> >> > challenge >>> >> > Build the coolest Linux based applications with Moblin SDK & win >>> >> > great >>> >> > prizes >>> >> > Grand prize is a trip for two to an Open Source event anywhere in >>> >> > the >>> >> > world >>> >> > http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>> >> > _______________________________________________ >>> >> > Oscar-users mailing list >>> >> > Oscar-users@lists.sourceforge.net >>> >> > https://lists.sourceforge.net/lists/listinfo/oscar-users >>> >> > >>> >> > >>> >> >>> >> >>> >> ------------------------------------------------------------------------- >>> >> This SF.Net email is sponsored by the Moblin Your Move Developer's >>> >> challenge >>> >> Build the coolest Linux based applications with Moblin SDK & win great >>> >> prizes >>> >> Grand prize is a trip for two to an Open Source event anywhere in the >>> >> world >>> >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>> >> _______________________________________________ >>> >> Oscar-users mailing list >>> >> Oscar-users@lists.sourceforge.net >>> >> https://lists.sourceforge.net/lists/listinfo/oscar-users >>> > >>> > >>> > >>> > -- >>> > A.Nazemian >>> > >>> > >>> > ------------------------------------------------------------------------- >>> > This SF.Net email is sponsored by the Moblin Your Move Developer's >>> > challenge >>> > Build the coolest Linux based applications with Moblin SDK & win great >>> > prizes >>> > Grand prize is a trip for two to an Open Source event anywhere in the >>> > world >>> > http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>> > _______________________________________________ >>> > Oscar-users mailing list >>> > Oscar-users@lists.sourceforge.net >>> > https://lists.sourceforge.net/lists/listinfo/oscar-users >>> > >>> > >>> >>> ------------------------------------------------------------------------- >>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>> challenge >>> Build the coolest Linux based applications with Moblin SDK & win great >>> prizes >>> Grand prize is a trip for two to an Open Source event anywhere in the >>> world >>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>> _______________________________________________ >>> Oscar-users mailing list >>> Oscar-users@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/oscar-users >> >> >> >> -- >> A.Nazemian > > > > -- > A.Nazemian > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Oscar-users mailing list > Oscar-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/oscar-users > > ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-users