Since we have just 22 node , so i think rescue disk would be reasonable.
I'll do that.
Also i want to install some software on nodes , such as Fluent , Ansys and
etc... how can i put these packages in image file instead of installing them
on nodes one by one?!
Cheers.
On Tue, Nov 4, 2008 at 1:05 AM, Michael Edwards <[EMAIL PROTECTED]> wrote:
> hd0 is just a label for one of the hard drives, that is fairly normal.
> For some reason the nodes aren't liking the drives that OSCAR picked
> for your nodes to use.
>
> What you'll need to do is boot a rescue CD to a node and copy the
> modprobe.conf file it uses (assuming it boots properly and can mount
> the node's disk) to over write the one made by OSCAR
>
> Then use the tips here
> (http://svn.oscar.openclustergroup.org/trac/oscar/wiki/TipLDAP) to
> change the OSCAR files used when imaging the nodes for future nodes.
>
> Then depending on if you have a lot of nodes, you can reimage the
> nodes or just repeat the rescue CD booting. Fix the modprobe in the
> images though, in case you reimage them later.
>
> Generally, the way I work is to make changes to a node, note what
> changes I made that worked, update the image with those changes, then
> reimmage the cluster if possible. This depends on how many users you
> have of course :)
>
> On Sun, Nov 2, 2008 at 4:08 AM, Ali Nazemian <[EMAIL PROTECTED]>
> wrote:
> > Sorry i forgot to write something in my last post , so add these
> sentences
> > to that:
> > on client node boot menu i have 2 choice( it's obvious) :
> > 2.6.9-78.ELsmp_(hd0,0)
> > 2.6.9-78.EL_(hd0,0)
> > what hd0 means? i didnt saw something like this in Linux boot menu,
> anyway ,
> > i tried to edit "2.6.9-78.ELsmp_(hd0,0)" command , and a new page with 3
> > choices showed up:
> > root(hd0,0)
> > kernel /vmlinuz-2.6.9-78.ELsmp ro root=/dev/sda6
> > initrd /sc-initrd-2.6.9-78.ELsmp.gz
> > i checked server node , and for server , i have a boot menu like this:(
> edit
> > command)
> > root(hd0,0)
> > kernel /vmlinuz-2.6.9-78.ELsmp ro root=/dev/VolGroup00/logvoI00 rhgb...
> > initrd /initrd-2.6.9-78.Elsmp.img
> > is this usuall ? or i am in trouble?
> > FYI: i used UYOK method.
> > cheers.
> >
> > On Sun, Nov 2, 2008 at 12:15 PM, Ali Nazemian <[EMAIL PROTECTED]>
> wrote:
> >>
> >> Hi again ,
> >> I set image post action to "beep" and start installation process , after
> a
> >> while it seems that installation process on client node , finished , and
> its
> >> waiting for reboot , so i reset node client , after that i changed boot
> >> priority to hard disk first , then node client tried to boot from hard
> disk
> >> and loading centos on it , but these errors showed up:
> >>
> >> Mounted /proc filesystem
> >> Mounting sysfs
> >> Creating /dev
> >> Starting udev
> >> Loading jbd.ko module
> >> Loading ext3.ko module
> >> Creating root device
> >> Mounting root filesystem
> >> mount:error 6 mounting ext3
> >> mount:error 2 mounting none
> >> Switching to new root
> >> switchroot:mount failed:22
> >> unmount /initrd/dev failed:2
> >> kernel panic - not syncing : Attempted to kill init!
> >> -------------------
> >> on the server node , in monitoring page , still installation of this
> node
> >> is green and its progress is "beeping" , so its seems it was successful
> >> until now.
> >> what is the problem? what should i do now?!
> >> cheers.
> >>
> >> On Sat, Nov 1, 2008 at 11:41 PM, Ali Nazemian <[EMAIL PROTECTED]>
> >> wrote:
> >>>
> >>> I checked what u said , and it seems that firewall was enable ,
> although
> >>> ssh was allowed , but that was enable , so this problem solved by
> disabling
> >>> firewall , after that some new errors showed up on client node ,it was
> >>> something about portioning problem in client node that i think it was
> >>> related to ide.disk/scsi.disk file , so i have questions about imaging
> >>> process in OSCAR installation, that probably can help me to install it
> >>> without any errors:
> >>> 1- in step "build the image" we should choose disk partion file , they
> >>> said we should choose scsi.disk for scsi disks and ide.disk for IDE
> disks ,
> >>> but what about SATA IDE disks? as u know hda partion format for IDE and
> sda
> >>> is for scsi , i saw SATA is use sda too , so should i choose
> scsi.disk?!
> >>> 2- in this step , we should ip assignment method , dafult value for
> that
> >>> is static , which one should i choose?! static or dhcp?! which one is
> more
> >>> efficeient? i think static is better, what do u think?!
> >>> 3- post install action , should be reboot , beep or something else? in
> >>> istallation manual it says we shouldn't choose reboot if we want to
> choose
> >>> network boot installation, now i dont know which one is better and
> errorless
> >>> for me?!
> >>> 4- I found something , when i want to find mac address of the client
> node
> >>> , ( i have just 2 node connected to switch as a pilot project , one of
> them
> >>> as a server and another one as a clinet ) wrong mac address found , i
> think
> >>> it is switch mac address that found , so i should insert client mac
> address
> >>> manually , do u think it can cause some errors in installation
> process?!
> >>> Best regards.
> >>>
> >>> On Fri, Oct 31, 2008 at 1:57 AM, <[EMAIL PROTECTED]> wrote:
> >>>>
> >>>> Have you checked the headnode, to make sure that your firewall is not
> >>>> running?
> >>>>
> >>>> for a GUI to turn off the firewall: system-config-securitylevel
> >>>>
> >>>> On Tue, Oct 28, 2008 at 2:51 PM, ali nazemian <[EMAIL PROTECTED]>
> >>>> wrote:
> >>>>>
> >>>>> Hi.
> >>>>> I want to execute clustering for our HPC center using OSCAR, but i
> have
> >>>>> a problem with step 7, installing cluster.
> >>>>> Here is my problem :
> >>>>> After i want to run step 7 , after some time on client node "tftp
> time
> >>>>> out" error appeared and node terminate the boot agent. and "Received
> >>>>> disconnect from 192.168.0.2: 2: The connection is closed by SSH
> Server
> >>>>> Current FSM is SSH_Main_SSHProcess" appeared on server node.
> >>>>> Here is the complete log of step 7:
> >>>>>
> >>>>>
> --------------------------------------------------------------------------
> >>>>> --> Update Wizard Env (as needed)
> >>>>> --> Step 7: Running: ./post_install
> >>>>> Gathering processor count from oscarnode1.clusternet.
> >>>>> ssh: connect to host oscarnode1.clusternet port 22: Connection timed
> >>>>> out
> >>>>> Improper count (0) returned from machine oscarnode1.clusternet at
> >>>>> ./post_install line 83
> >>>>> main::get_numproc() called at ./post_install line 39
> >>>>> ssh: connect to host oscarnode1 port 22: Connection timed out
> >>>>> rsync: connection unexpectedly closed (0 bytes received so far)
> >>>>> [sender]
> >>>>> rsync error: error in rsync protocol data stream (code 12) at
> io.c(359)
> >>>>> --> About to run /opt/oscar/packages/loghost/scripts/post_install for
> >>>>> loghost
> >>>>> ************************* oscar_cluster *************************
> >>>>> --------- oscarnode1---------
> >>>>> ssh: connect to host oscarnode1 port 22: Connection timed out
> >>>>> --> About to run /opt/oscar/packages/ganglia/scripts/post_install for
> >>>>> ganglia
> >>>>> [ganglia] Ganglia gmond configuration file modified, re-starting
> >>>>> daemon...
> >>>>> Shutting down GANGLIA gmond: [60G[ [0;32mOK [0;39m ]
> >>>>> Starting GANGLIA gmond: [60G[ [0;32mOK [0;39m ]
> >>>>> editing /etc/gmetad.conf
> >>>>> match: gridname\s+.*
> >>>>> match: data_source\s+.*
> >>>>> [ganglia] Ganglia gmetad configuration file modified, re-starting
> >>>>> daemon...
> >>>>> Shutting down GANGLIA gmetad: [60G[ [0;32mOK [0;39m ]
> >>>>> Starting GANGLIA gmetad: [60G[ [0;32mOK [0;39m ]
> >>>>> [ganglia] Starting up apache...
> >>>>> Stopping httpd: [60G[ [0;32mOK [0;39m ]
> >>>>> Starting httpd: [60G[ [0;32mOK [0;39m ]
> >>>>> [ganglia] Ganglia page is located at
> http://server.clusternet/ganglia/
> >>>>> ************************* oscar_cluster *************************
> >>>>> --------- oscarnode1---------
> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed by
> >>>>> SSH Server
> >>>>> Current FSM is SSH_Main_SSHProcess
> >>>>> --> About to run /opt/oscar/packages/torque/scripts/post_install for
> >>>>> torque
> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed by
> >>>>> SSH Server
> >>>>> Current FSM is SSH_Main_SSHProcess
> >>>>> rsync: connection unexpectedly closed (0 bytes received so far)
> >>>>> [sender]
> >>>>> rsync error: error in rsync protocol data stream (code 12) at
> io.c(359)
> >>>>> TORQUE mom config file updated with clienthost: server.clusternet
> >>>>> Pushing config file to clients...
> >>>>> Sending SIGHUP to all moms...
> >>>>> ************************* oscar_cluster *************************
> >>>>> --------- oscarnode1---------
> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed by
> >>>>> SSH Server
> >>>>> Current FSM is SSH_Main_SSHProcess
> >>>>> [torque] Updating pbs_server nodes
> >>>>> /opt/pbs/bin/pbsnodes: Server has no node list
> >>>>> Shutting down TORQUE Server: [60G[ [0;32mOK [0;39m ]
> >>>>> Starting TORQUE Server: [60G[ [0;32mOK [0;39m ]
> >>>>> [torque] Creating TORQUE workq queue...
> >>>>> Max open servers: 4
> >>>>> set queue workq resources_max.ncpus = 0
> >>>>> set queue workq resources_max.nodect = 0
> >>>>> set queue workq resources_available.nodect = 0
> >>>>> set server resources_available.ncpus = 0
> >>>>> set server resources_available.nodect = 0
> >>>>> set server resources_available.nodes = 0
> >>>>> set server resources_max.ncpus = 0
> >>>>> set server resources_max.nodes = 0
> >>>>> set server scheduler_iteration = 60
> >>>>> set server log_events = 64
> >>>>> Shutting down MAUI Scheduler: [60G[ [0;32mOK [0;39m ]
> >>>>> Starting MAUI Scheduler: [60G[ [0;32mOK [0;39m ]
> >>>>> --> About to run /opt/oscar/packages/switcher/scripts/post_install
> for
> >>>>> switcher
> >>>>> Setting default for tag mpi ("lam-7.1.2")
> >>>>> Attribute successfully set; new attribute setting will be effective
> for
> >>>>> future shells
> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed by
> >>>>> SSH Server
> >>>>> Current FSM is SSH_Main_SSHProcess
> >>>>> rsync: connection unexpectedly closed (0 bytes received so far)
> >>>>> [sender]
> >>>>> rsync error: error in rsync protocol data stream (code 12) at
> io.c(359)
> >>>>> --> About to run /opt/oscar/packages/mta-config/scripts/post_install
> >>>>> for mta-config
> >>>>> ************************************ WARNING
> >>>>> ************************************
> >>>>> OSCAR could not set up the configuration for any mailing service on
> the
> >>>>> server.
> >>>>> The current version of the mta-config package in OSCAR only supports
> >>>>> the Postfix mail transfer agent (MTA).
> >>>>> It looks like you have another MTA installed (e.g, sendmail or exim);
> >>>>> as such,
> >>>>> please be aware that OSCAR will not automatically configure it.
> >>>>> ************************************ WARNING
> >>>>> ************************************
> >>>>> --> About to run /opt/oscar/packages/ntpconfig/scripts/post_install
> for
> >>>>> ntpconfig
> >>>>> Shutting down ntpd: [60G[ [0;32mOK [0;39m ]
> >>>>> Starting ntpd: [60G[ [0;32mOK [0;39m ]
> >>>>> ************************* oscar_cluster *************************
> >>>>> --------- oscarnode1---------
> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed by
> >>>>> SSH Server
> >>>>> Current FSM is SSH_Main_SSHProcess
> >>>>> --> About to run /opt/oscar/packages/opium/scripts/post_install for
> >>>>> opium
> >>>>> Not all hosts were accessible by c3! Will retry the update later
> >>>>> Could not find template for file switcher.ini
> >>>>> If this contains distro-specific lines, please create a template!
> >>>>> image:
> >>>>> $VAR1 = 'oscarimage';
> >>>>> ---------------
> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed by
> >>>>> SSH Server
> >>>>> Current FSM is SSH_Main_SSHProcess
> >>>>> rsync: connection unexpectedly closed (0 bytes received so far)
> >>>>> [sender]
> >>>>> rsync error: error in rsync protocol data stream (code 12) at
> io.c(359)
> >>>>> Could not find template for file gshadow
> >>>>> If this contains distro-specific lines, please create a template!
> >>>>> image:
> >>>>> $VAR1 = 'oscarimage';
> >>>>> ---------------
> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed by
> >>>>> SSH Server
> >>>>> Current FSM is SSH_Main_SSHProcess
> >>>>> rsync: connection unexpectedly closed (0 bytes received so far)
> >>>>> [sender]
> >>>>> rsync error: error in rsync protocol data stream (code 12) at
> io.c(359)
> >>>>> image:
> >>>>> $VAR1 = 'oscarimage';
> >>>>> ---------------
> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed by
> >>>>> SSH Server
> >>>>> Current FSM is SSH_Main_SSHProcess
> >>>>> rsync: connection unexpectedly closed (0 bytes received so far)
> >>>>> [sender]
> >>>>> rsync error: error in rsync protocol data stream (code 12) at
> io.c(359)
> >>>>> image:
> >>>>> $VAR1 = 'oscarimage';
> >>>>> ---------------
> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed by
> >>>>> SSH Server
> >>>>> Current FSM is SSH_Main_SSHProcess
> >>>>> rsync: connection unexpectedly closed (0 bytes received so far)
> >>>>> [sender]
> >>>>> rsync error: error in rsync protocol data stream (code 12) at
> io.c(359)
> >>>>> image:
> >>>>> $VAR1 = 'oscarimage';
> >>>>> ---------------
> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed by
> >>>>> SSH Server
> >>>>> Current FSM is SSH_Main_SSHProcess
> >>>>> rsync: connection unexpectedly closed (0 bytes received so far)
> >>>>> [sender]
> >>>>> rsync error: error in rsync protocol data stream (code 12) at
> io.c(359)
> >>>>> --> About to run /opt/oscar/packages/oda/scripts/post_install for oda
> >>>>> generating the /etc/odaserver file on all oscar clients
> >>>>> . /etc/profile.d/c3.sh && cexec 'echo oscar_server > /etc/odaserver'
> >>>>> ************************* oscar_cluster *************************
> >>>>> --------- oscarnode1---------
> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed by
> >>>>> SSH Server
> >>>>> Current FSM is SSH_Main_SSHProcess
> >>>>> Cluster setup complete!
> >>>>> --> Step 7: Successfully completed the cluster install
> >>>>> --> Update Wizard Env (as needed)
> >>>>>
> >>>>>
> -----------------------------------------------------------------------------------
> >>>>> P.S: i am using OSCAR 5 on centos 4.7-x86_64 , i cant use centos 5.X
> >>>>> because of it has problem with my graphic cards.
> >>>>> Best regards.
> >>>>>
> >>>>> --
> >>>>> A.Nazemian
> >>>>>
> >>>>>
> >>>>>
> -------------------------------------------------------------------------
> >>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's
> >>>>> challenge
> >>>>> Build the coolest Linux based applications with Moblin SDK & win
> great
> >>>>> prizes
> >>>>> Grand prize is a trip for two to an Open Source event anywhere in the
> >>>>> world
> >>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> >>>>> _______________________________________________
> >>>>> Oscar-users mailing list
> >>>>> Oscar-users@lists.sourceforge.net
> >>>>> https://lists.sourceforge.net/lists/listinfo/oscar-users
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>>
> -------------------------------------------------------------------------
> >>>> This SF.Net email is sponsored by the Moblin Your Move Developer's
> >>>> challenge
> >>>> Build the coolest Linux based applications with Moblin SDK & win great
> >>>> prizes
> >>>> Grand prize is a trip for two to an Open Source event anywhere in the
> >>>> world
> >>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> >>>> _______________________________________________
> >>>> Oscar-users mailing list
> >>>> Oscar-users@lists.sourceforge.net
> >>>> https://lists.sourceforge.net/lists/listinfo/oscar-users
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> A.Nazemian
> >>
> >>
> >>
> >> --
> >> A.Nazemian
> >
> >
> >
> > --
> > A.Nazemian
> >
> > -------------------------------------------------------------------------
> > This SF.Net email is sponsored by the Moblin Your Move Developer's
> challenge
> > Build the coolest Linux based applications with Moblin SDK & win great
> > prizes
> > Grand prize is a trip for two to an Open Source event anywhere in the
> world
> > http://moblin-contest.org/redirect.php?banner_id=100&url=/
> > _______________________________________________
> > Oscar-users mailing list
> > Oscar-users@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/oscar-users
> >
> >
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's
> challenge
> Build the coolest Linux based applications with Moblin SDK & win great
> prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Oscar-users mailing list
> Oscar-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/oscar-users
>
--
A.Nazemian
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users