Hi again,
As i said before ,
OS: centos 4.7
Ethernet card:2 onboard ethernet card, intel 1000 pro i think.
CPU:xeon quad core 5100 family , 2.66ghz , 12mb cache.
about ssh head node , i cant check that , cause of i haven't any os install
on client nodes , how can i ssh 192.168.0.1 on the client nodes without any
command environment on them?! but as i said i try to ssh clients node from
server and "no route to host" message showed up.
for collecting mac address i just use "start collecting mac address" and it
find all of them well , and about "Setup Network Boot" , yes i did that. as
i said all of steps except step 7 , did successfully.
I set host.allow file to ALL:ALL and i havent seen any difference.
yeah SeLinux is disable.
is there any chance , my switch didnt suppert clustering?!
Please read my posts again i think all of needed information was posted
before...
On Thu, Oct 30, 2008 at 4:30 PM, Michael Edwards <[EMAIL PROTECTED]> wrote:
> It sounds like the nodes did not image properly. There are many many
> possible problems, and I am not entirely sure I understand what you
> have done so I am going to ask many questions.
>
> What hardware are you using? Specifically interested in OS, network
> cards, and cpu types.
>
> 192.168.0.1 is a very common default address for networking equipment.
> Can you ssh to 192.168.0.1 from another computer on the cluster
> switch (move one to the switch if the nodes arent booting).
>
> Anyway, let me see if I understand what you have done. You have
> created your image on the head node. Then you have run the "Setup
> Networking" Step, and successfully collected the MAC addresses of the
> nodes? or did you enter them in from a file?
> There is a button on that screen called "Setup Network Boot" which is
> necessary, did you click on that? If not then the nodes will not
> image properly.
>
> Then did you network boot the nodes again, with a monitor attached and
> see what happened? Do you have the firewall on the head node turned
> off? SE Linux off?
>
>
>
> On Wed, Oct 29, 2008 at 5:11 AM, ali nazemian <[EMAIL PROTECTED]>
> wrote:
> > Hi again.
> > Let me explain about my problem more:
> > here is the result on client node:
> > client mac addr: XX XX XX XX XX XX ...
> > client ip: 192.168.0.2 mask: 255.255.255.0 dhcp ip: 192.168.0.1
> > gateway ip: 192.168.0.1
> > pxe-e32: tftp open timeout.
> > ...
> > and same time on the server node:
> > ssh:connect to host oscarnode1.clusternet port 22: connection time out
> > i use "nmap -a" to see which port is open and which is not, here is the
> > result:
> > starting nmap 3.70 (http://...) at 2008-10-29 14:12 IRST
> > no target machines/network specified!
> > quitting!
> > i couldnt use "ssh 192.168.0.1" on the client node , cause of i havent
> any
> > command environment there to type any command , so i use "ssh
> 192.168.0.2"
> > on server and the result was: ssh: connect to host 192.168.0.2 port 22:
> no
> > route to host
> > i used cd boot instead of network boot , same result appeared:
> > connect to host 192.168.0.2 port 22: no route to host
> >
> > It seems i have a problem with my network not OSCAR, what do u think?!
> >
> > On Wed, Oct 29, 2008 at 1:36 AM, ali nazemian <[EMAIL PROTECTED]>
> wrote:
> >>
> >> I use 192.168.x.x just for test , and as a pilot this network didnt
> >> connect to other networks so in this case 192.168.x.x shouldnt be a
> problem
> >> , however i will use 10.0.x.x for final clustering network.
> >> Now i haven't access to those nodes , so i'll test ssh tomorrow and let
> >> you know what result i get.
> >>
> >> On Wed, Oct 29, 2008 at 12:03 AM, Michael Edwards <[EMAIL PROTECTED]>
> >> wrote:
> >>>
> >>> It looks like the nodes didn't image, or are now unable to communicate
> >>> with the head node in any event.
> >>>
> >>> Plug in a monitor and keyboard to your compute node and see if you
> >>> have a login prompt. You should be able to log in as your root user
> >>> from the head node. If it allows you to log in there (you will need
> >>> the password) try running "ssh 192.168.0.1"
> >>>
> >>> It is quite possible that a switch or other device on your network is
> >>> using the 192.168.0.1 network. I generally use the 10.0.0.1 network
> >>> because of this.
> >>>
> >>> On Tue, Oct 28, 2008 at 4:18 PM, ali nazemian <[EMAIL PROTECTED]>
> >>> wrote:
> >>> > For your first question,it seems that all of steps before step 7
> >>> > successfully completed .
> >>> > and about your second one , i dont know how to check that.
> >>> > I think maybe its hardware problem for my switch , its 3com 24 port
> >>> > switch ,
> >>> > can it be my problem?!
> >>> >
> >>> > On Tue, Oct 28, 2008 at 11:30 PM, Michael Edwards <
> [EMAIL PROTECTED]>
> >>> > wrote:
> >>> >>
> >>> >> Did the image deployment complete successfully and the nodes reboot
> to
> >>> >> the oscar image?
> >>> >>
> >>> >> Can you ssh to the compute nodes from the head node (without getting
> a
> >>> >> password prompt)?
> >>> >>
> >>> >> On Tue, Oct 28, 2008 at 3:51 PM, ali nazemian <
> [EMAIL PROTECTED]>
> >>> >> wrote:
> >>> >> > Hi.
> >>> >> > I want to execute clustering for our HPC center using OSCAR, but i
> >>> >> > have
> >>> >> > a
> >>> >> > problem with step 7, installing cluster.
> >>> >> > Here is my problem :
> >>> >> > After i want to run step 7 , after some time on client node "tftp
> >>> >> > time
> >>> >> > out"
> >>> >> > error appeared and node terminate the boot agent. and "Received
> >>> >> > disconnect
> >>> >> > from 192.168.0.2: 2: The connection is closed by SSH Server
> >>> >> > Current FSM is SSH_Main_SSHProcess" appeared on server node.
> >>> >> > Here is the complete log of step 7:
> >>> >> >
> >>> >> >
> >>> >> >
> --------------------------------------------------------------------------
> >>> >> > --> Update Wizard Env (as needed)
> >>> >> > --> Step 7: Running: ./post_install
> >>> >> > Gathering processor count from oscarnode1.clusternet.
> >>> >> > ssh: connect to host oscarnode1.clusternet port 22: Connection
> timed
> >>> >> > out
> >>> >> > Improper count (0) returned from machine oscarnode1.clusternet at
> >>> >> > ./post_install line 83
> >>> >> > main::get_numproc() called at ./post_install line 39
> >>> >> > ssh: connect to host oscarnode1 port 22: Connection timed out
> >>> >> > rsync: connection unexpectedly closed (0 bytes received so far)
> >>> >> > [sender]
> >>> >> > rsync error: error in rsync protocol data stream (code 12) at
> >>> >> > io.c(359)
> >>> >> > --> About to run /opt/oscar/packages/loghost/scripts/post_install
> >>> >> > for
> >>> >> > loghost
> >>> >> > ************************* oscar_cluster *************************
> >>> >> > --------- oscarnode1---------
> >>> >> > ssh: connect to host oscarnode1 port 22: Connection timed out
> >>> >> > --> About to run /opt/oscar/packages/ganglia/scripts/post_install
> >>> >> > for
> >>> >> > ganglia
> >>> >> > [ganglia] Ganglia gmond configuration file modified, re-starting
> >>> >> > daemon...
> >>> >> > Shutting down GANGLIA gmond: [60G[ [0;32mOK [0;39m ]
> >>> >> > Starting GANGLIA gmond: [60G[ [0;32mOK [0;39m ]
> >>> >> > editing /etc/gmetad.conf
> >>> >> > match: gridname\s+.*
> >>> >> > match: data_source\s+.*
> >>> >> > [ganglia] Ganglia gmetad configuration file modified, re-starting
> >>> >> > daemon...
> >>> >> > Shutting down GANGLIA gmetad: [60G[ [0;32mOK [0;39m ]
> >>> >> > Starting GANGLIA gmetad: [60G[ [0;32mOK [0;39m ]
> >>> >> > [ganglia] Starting up apache...
> >>> >> > Stopping httpd: [60G[ [0;32mOK [0;39m ]
> >>> >> > Starting httpd: [60G[ [0;32mOK [0;39m ]
> >>> >> > [ganglia] Ganglia page is located at
> >>> >> > http://server.clusternet/ganglia/
> >>> >> > ************************* oscar_cluster *************************
> >>> >> > --------- oscarnode1---------
> >>> >> > Received disconnect from 192.168.0.2: 2: The connection is closed
> by
> >>> >> > SSH
> >>> >> > Server
> >>> >> > Current FSM is SSH_Main_SSHProcess
> >>> >> > --> About to run /opt/oscar/packages/torque/scripts/post_install
> for
> >>> >> > torque
> >>> >> > Received disconnect from 192.168.0.2: 2: The connection is closed
> by
> >>> >> > SSH
> >>> >> > Server
> >>> >> > Current FSM is SSH_Main_SSHProcess
> >>> >> > rsync: connection unexpectedly closed (0 bytes received so far)
> >>> >> > [sender]
> >>> >> > rsync error: error in rsync protocol data stream (code 12) at
> >>> >> > io.c(359)
> >>> >> > TORQUE mom config file updated with clienthost: server.clusternet
> >>> >> > Pushing config file to clients...
> >>> >> > Sending SIGHUP to all moms...
> >>> >> > ************************* oscar_cluster *************************
> >>> >> > --------- oscarnode1---------
> >>> >> > Received disconnect from 192.168.0.2: 2: The connection is closed
> by
> >>> >> > SSH
> >>> >> > Server
> >>> >> > Current FSM is SSH_Main_SSHProcess
> >>> >> > [torque] Updating pbs_server nodes
> >>> >> > /opt/pbs/bin/pbsnodes: Server has no node list
> >>> >> > Shutting down TORQUE Server: [60G[ [0;32mOK [0;39m ]
> >>> >> > Starting TORQUE Server: [60G[ [0;32mOK [0;39m ]
> >>> >> > [torque] Creating TORQUE workq queue...
> >>> >> > Max open servers: 4
> >>> >> > set queue workq resources_max.ncpus = 0
> >>> >> > set queue workq resources_max.nodect = 0
> >>> >> > set queue workq resources_available.nodect = 0
> >>> >> > set server resources_available.ncpus = 0
> >>> >> > set server resources_available.nodect = 0
> >>> >> > set server resources_available.nodes = 0
> >>> >> > set server resources_max.ncpus = 0
> >>> >> > set server resources_max.nodes = 0
> >>> >> > set server scheduler_iteration = 60
> >>> >> > set server log_events = 64
> >>> >> > Shutting down MAUI Scheduler: [60G[ [0;32mOK [0;39m ]
> >>> >> > Starting MAUI Scheduler: [60G[ [0;32mOK [0;39m ]
> >>> >> > --> About to run /opt/oscar/packages/switcher/scripts/post_install
> >>> >> > for
> >>> >> > switcher
> >>> >> > Setting default for tag mpi ("lam-7.1.2")
> >>> >> > Attribute successfully set; new attribute setting will be
> effective
> >>> >> > for
> >>> >> > future shells
> >>> >> > Received disconnect from 192.168.0.2: 2: The connection is closed
> by
> >>> >> > SSH
> >>> >> > Server
> >>> >> > Current FSM is SSH_Main_SSHProcess
> >>> >> > rsync: connection unexpectedly closed (0 bytes received so far)
> >>> >> > [sender]
> >>> >> > rsync error: error in rsync protocol data stream (code 12) at
> >>> >> > io.c(359)
> >>> >> > --> About to run
> /opt/oscar/packages/mta-config/scripts/post_install
> >>> >> > for
> >>> >> > mta-config
> >>> >> > ************************************ WARNING
> >>> >> > ************************************
> >>> >> > OSCAR could not set up the configuration for any mailing service
> on
> >>> >> > the
> >>> >> > server.
> >>> >> > The current version of the mta-config package in OSCAR only
> supports
> >>> >> > the
> >>> >> > Postfix mail transfer agent (MTA).
> >>> >> > It looks like you have another MTA installed (e.g, sendmail or
> >>> >> > exim); as
> >>> >> > such,
> >>> >> > please be aware that OSCAR will not automatically configure it.
> >>> >> > ************************************ WARNING
> >>> >> > ************************************
> >>> >> > --> About to run
> /opt/oscar/packages/ntpconfig/scripts/post_install
> >>> >> > for
> >>> >> > ntpconfig
> >>> >> > Shutting down ntpd: [60G[ [0;32mOK [0;39m ]
> >>> >> > Starting ntpd: [60G[ [0;32mOK [0;39m ]
> >>> >> > ************************* oscar_cluster *************************
> >>> >> > --------- oscarnode1---------
> >>> >> > Received disconnect from 192.168.0.2: 2: The connection is closed
> by
> >>> >> > SSH
> >>> >> > Server
> >>> >> > Current FSM is SSH_Main_SSHProcess
> >>> >> > --> About to run /opt/oscar/packages/opium/scripts/post_install
> for
> >>> >> > opium
> >>> >> > Not all hosts were accessible by c3! Will retry the update later
> >>> >> > Could not find template for file switcher.ini
> >>> >> > If this contains distro-specific lines, please create a template!
> >>> >> > image:
> >>> >> > $VAR1 = 'oscarimage';
> >>> >> > ---------------
> >>> >> > Received disconnect from 192.168.0.2: 2: The connection is closed
> by
> >>> >> > SSH
> >>> >> > Server
> >>> >> > Current FSM is SSH_Main_SSHProcess
> >>> >> > rsync: connection unexpectedly closed (0 bytes received so far)
> >>> >> > [sender]
> >>> >> > rsync error: error in rsync protocol data stream (code 12) at
> >>> >> > io.c(359)
> >>> >> > Could not find template for file gshadow
> >>> >> > If this contains distro-specific lines, please create a template!
> >>> >> > image:
> >>> >> > $VAR1 = 'oscarimage';
> >>> >> > ---------------
> >>> >> > Received disconnect from 192.168.0.2: 2: The connection is closed
> by
> >>> >> > SSH
> >>> >> > Server
> >>> >> > Current FSM is SSH_Main_SSHProcess
> >>> >> > rsync: connection unexpectedly closed (0 bytes received so far)
> >>> >> > [sender]
> >>> >> > rsync error: error in rsync protocol data stream (code 12) at
> >>> >> > io.c(359)
> >>> >> > image:
> >>> >> > $VAR1 = 'oscarimage';
> >>> >> > ---------------
> >>> >> > Received disconnect from 192.168.0.2: 2: The connection is closed
> by
> >>> >> > SSH
> >>> >> > Server
> >>> >> > Current FSM is SSH_Main_SSHProcess
> >>> >> > rsync: connection unexpectedly closed (0 bytes received so far)
> >>> >> > [sender]
> >>> >> > rsync error: error in rsync protocol data stream (code 12) at
> >>> >> > io.c(359)
> >>> >> > image:
> >>> >> > $VAR1 = 'oscarimage';
> >>> >> > ---------------
> >>> >> > Received disconnect from 192.168.0.2: 2: The connection is closed
> by
> >>> >> > SSH
> >>> >> > Server
> >>> >> > Current FSM is SSH_Main_SSHProcess
> >>> >> > rsync: connection unexpectedly closed (0 bytes received so far)
> >>> >> > [sender]
> >>> >> > rsync error: error in rsync protocol data stream (code 12) at
> >>> >> > io.c(359)
> >>> >> > image:
> >>> >> > $VAR1 = 'oscarimage';
> >>> >> > ---------------
> >>> >> > Received disconnect from 192.168.0.2: 2: The connection is closed
> by
> >>> >> > SSH
> >>> >> > Server
> >>> >> > Current FSM is SSH_Main_SSHProcess
> >>> >> > rsync: connection unexpectedly closed (0 bytes received so far)
> >>> >> > [sender]
> >>> >> > rsync error: error in rsync protocol data stream (code 12) at
> >>> >> > io.c(359)
> >>> >> > --> About to run /opt/oscar/packages/oda/scripts/post_install for
> >>> >> > oda
> >>> >> > generating the /etc/odaserver file on all oscar clients
> >>> >> > . /etc/profile.d/c3.sh && cexec 'echo oscar_server >
> /etc/odaserver'
> >>> >> > ************************* oscar_cluster *************************
> >>> >> > --------- oscarnode1---------
> >>> >> > Received disconnect from 192.168.0.2: 2: The connection is closed
> by
> >>> >> > SSH
> >>> >> > Server
> >>> >> > Current FSM is SSH_Main_SSHProcess
> >>> >> > Cluster setup complete!
> >>> >> > --> Step 7: Successfully completed the cluster install
> >>> >> > --> Update Wizard Env (as needed)
> >>> >> >
> >>> >> >
> >>> >> >
> -----------------------------------------------------------------------------------
> >>> >> > P.S: i am using OSCAR 5 on centos 4.7-x86_64 , i cant use centos
> 5.X
> >>> >> > because
> >>> >> > of it has problem with my graphic cards.
> >>> >> > Best regards.
> >>> >> >
> >>> >> > --
> >>> >> > A.Nazemian
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> >
> -------------------------------------------------------------------------
> >>> >> > This SF.Net email is sponsored by the Moblin Your Move Developer's
> >>> >> > challenge
> >>> >> > Build the coolest Linux based applications with Moblin SDK & win
> >>> >> > great
> >>> >> > prizes
> >>> >> > Grand prize is a trip for two to an Open Source event anywhere in
> >>> >> > the
> >>> >> > world
> >>> >> > http://moblin-contest.org/redirect.php?banner_id=100&url=/
> >>> >> > _______________________________________________
> >>> >> > Oscar-users mailing list
> >>> >> > Oscar-users@lists.sourceforge.net
> >>> >> > https://lists.sourceforge.net/lists/listinfo/oscar-users
> >>> >> >
> >>> >> >
> >>> >>
> >>> >>
> >>> >>
> -------------------------------------------------------------------------
> >>> >> This SF.Net email is sponsored by the Moblin Your Move Developer's
> >>> >> challenge
> >>> >> Build the coolest Linux based applications with Moblin SDK & win
> great
> >>> >> prizes
> >>> >> Grand prize is a trip for two to an Open Source event anywhere in
> the
> >>> >> world
> >>> >> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> >>> >> _______________________________________________
> >>> >> Oscar-users mailing list
> >>> >> Oscar-users@lists.sourceforge.net
> >>> >> https://lists.sourceforge.net/lists/listinfo/oscar-users
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > A.Nazemian
> >>> >
> >>> >
> >>> >
> -------------------------------------------------------------------------
> >>> > This SF.Net email is sponsored by the Moblin Your Move Developer's
> >>> > challenge
> >>> > Build the coolest Linux based applications with Moblin SDK & win
> great
> >>> > prizes
> >>> > Grand prize is a trip for two to an Open Source event anywhere in the
> >>> > world
> >>> > http://moblin-contest.org/redirect.php?banner_id=100&url=/
> >>> > _______________________________________________
> >>> > Oscar-users mailing list
> >>> > Oscar-users@lists.sourceforge.net
> >>> > https://lists.sourceforge.net/lists/listinfo/oscar-users
> >>> >
> >>> >
> >>>
> >>>
> -------------------------------------------------------------------------
> >>> This SF.Net email is sponsored by the Moblin Your Move Developer's
> >>> challenge
> >>> Build the coolest Linux based applications with Moblin SDK & win great
> >>> prizes
> >>> Grand prize is a trip for two to an Open Source event anywhere in the
> >>> world
> >>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> >>> _______________________________________________
> >>> Oscar-users mailing list
> >>> Oscar-users@lists.sourceforge.net
> >>> https://lists.sourceforge.net/lists/listinfo/oscar-users
> >>
> >>
> >>
> >> --
> >> A.Nazemian
> >
> >
> >
> > --
> > A.Nazemian
> >
> > -------------------------------------------------------------------------
> > This SF.Net email is sponsored by the Moblin Your Move Developer's
> challenge
> > Build the coolest Linux based applications with Moblin SDK & win great
> > prizes
> > Grand prize is a trip for two to an Open Source event anywhere in the
> world
> > http://moblin-contest.org/redirect.php?banner_id=100&url=/
> > _______________________________________________
> > Oscar-users mailing list
> > Oscar-users@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/oscar-users
> >
> >
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's
> challenge
> Build the coolest Linux based applications with Moblin SDK & win great
> prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Oscar-users mailing list
> Oscar-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/oscar-users
>
--
A.Nazemian
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users