Hi again , i did what u said:
1- boot centos 4.7 install cd and choose linux rescue mode.
2- copy modprobe.conf from /tmp folder ( mounted by rescue mode) to /etc
but nothing happend , the kernel panic error still exist.
i checked modprobe.conf from rescue , and it was like this:
alias eth0 e1000
alias eth1 e1000
alias scsi_hostadapter ata_piix
I also copied modeprobe.conf file from head node , to /etc of client node ,
but problem still exist , here is the head node modprobe.conf file:
alias eth0 e1000
alias eth1 e1000
alias scsi_hostadapter ata_piix
alias snd-card-0 snd-hda-intel
options snd-card-0 index=0
install snd-hda-intel /sbin/modprobe --ignore-install snd-hda-intel &&
/usr/sbin/alsactl restore >/dev/null 2>&1 || :
remove snd-hda-intel { /usr/sbin/alsactl store >/dev/null 2>&1 || : ; };
/sbin/modprobe -r --ignore-remove snd-hda-intel
alias usb-controller ehci-hcd
alias usb-controller1 uhci-hcd
Are u sure my problem is because of modprobe.conf file?
FYI: I had 3 SATA hard drive on head node and client node, i just install
centos 4.7 on first hard drive , second and third one are unallocated.
Thanks for ur help.
On Tue, Nov 4, 2008 at 2:08 AM, Ali Nazemian <[EMAIL PROTECTED]> wrote:
> Since we have just 22 node , so i think rescue disk would be reasonable.
> I'll do that.
> Also i want to install some software on nodes , such as Fluent , Ansys and
> etc... how can i put these packages in image file instead of installing them
> on nodes one by one?!
> Cheers.
>
>
> On Tue, Nov 4, 2008 at 1:05 AM, Michael Edwards <[EMAIL PROTECTED]>wrote:
>
>> hd0 is just a label for one of the hard drives, that is fairly normal.
>> For some reason the nodes aren't liking the drives that OSCAR picked
>> for your nodes to use.
>>
>> What you'll need to do is boot a rescue CD to a node and copy the
>> modprobe.conf file it uses (assuming it boots properly and can mount
>> the node's disk) to over write the one made by OSCAR
>>
>> Then use the tips here
>> (http://svn.oscar.openclustergroup.org/trac/oscar/wiki/TipLDAP) to
>> change the OSCAR files used when imaging the nodes for future nodes.
>>
>> Then depending on if you have a lot of nodes, you can reimage the
>> nodes or just repeat the rescue CD booting. Fix the modprobe in the
>> images though, in case you reimage them later.
>>
>> Generally, the way I work is to make changes to a node, note what
>> changes I made that worked, update the image with those changes, then
>> reimmage the cluster if possible. This depends on how many users you
>> have of course :)
>>
>> On Sun, Nov 2, 2008 at 4:08 AM, Ali Nazemian <[EMAIL PROTECTED]>
>> wrote:
>> > Sorry i forgot to write something in my last post , so add these
>> sentences
>> > to that:
>> > on client node boot menu i have 2 choice( it's obvious) :
>> > 2.6.9-78.ELsmp_(hd0,0)
>> > 2.6.9-78.EL_(hd0,0)
>> > what hd0 means? i didnt saw something like this in Linux boot menu,
>> anyway ,
>> > i tried to edit "2.6.9-78.ELsmp_(hd0,0)" command , and a new page with 3
>> > choices showed up:
>> > root(hd0,0)
>> > kernel /vmlinuz-2.6.9-78.ELsmp ro root=/dev/sda6
>> > initrd /sc-initrd-2.6.9-78.ELsmp.gz
>> > i checked server node , and for server , i have a boot menu like this:(
>> edit
>> > command)
>> > root(hd0,0)
>> > kernel /vmlinuz-2.6.9-78.ELsmp ro root=/dev/VolGroup00/logvoI00 rhgb...
>> > initrd /initrd-2.6.9-78.Elsmp.img
>> > is this usuall ? or i am in trouble?
>> > FYI: i used UYOK method.
>> > cheers.
>> >
>> > On Sun, Nov 2, 2008 at 12:15 PM, Ali Nazemian <[EMAIL PROTECTED]>
>> wrote:
>> >>
>> >> Hi again ,
>> >> I set image post action to "beep" and start installation process ,
>> after a
>> >> while it seems that installation process on client node , finished ,
>> and its
>> >> waiting for reboot , so i reset node client , after that i changed boot
>> >> priority to hard disk first , then node client tried to boot from hard
>> disk
>> >> and loading centos on it , but these errors showed up:
>> >>
>> >> Mounted /proc filesystem
>> >> Mounting sysfs
>> >> Creating /dev
>> >> Starting udev
>> >> Loading jbd.ko module
>> >> Loading ext3.ko module
>> >> Creating root device
>> >> Mounting root filesystem
>> >> mount:error 6 mounting ext3
>> >> mount:error 2 mounting none
>> >> Switching to new root
>> >> switchroot:mount failed:22
>> >> unmount /initrd/dev failed:2
>> >> kernel panic - not syncing : Attempted to kill init!
>> >> -------------------
>> >> on the server node , in monitoring page , still installation of this
>> node
>> >> is green and its progress is "beeping" , so its seems it was successful
>> >> until now.
>> >> what is the problem? what should i do now?!
>> >> cheers.
>> >>
>> >> On Sat, Nov 1, 2008 at 11:41 PM, Ali Nazemian <[EMAIL PROTECTED]>
>> >> wrote:
>> >>>
>> >>> I checked what u said , and it seems that firewall was enable ,
>> although
>> >>> ssh was allowed , but that was enable , so this problem solved by
>> disabling
>> >>> firewall , after that some new errors showed up on client node ,it was
>> >>> something about portioning problem in client node that i think it was
>> >>> related to ide.disk/scsi.disk file , so i have questions about
>> imaging
>> >>> process in OSCAR installation, that probably can help me to install it
>> >>> without any errors:
>> >>> 1- in step "build the image" we should choose disk partion file , they
>> >>> said we should choose scsi.disk for scsi disks and ide.disk for IDE
>> disks ,
>> >>> but what about SATA IDE disks? as u know hda partion format for IDE
>> and sda
>> >>> is for scsi , i saw SATA is use sda too , so should i choose
>> scsi.disk?!
>> >>> 2- in this step , we should ip assignment method , dafult value for
>> that
>> >>> is static , which one should i choose?! static or dhcp?! which one is
>> more
>> >>> efficeient? i think static is better, what do u think?!
>> >>> 3- post install action , should be reboot , beep or something else? in
>> >>> istallation manual it says we shouldn't choose reboot if we want to
>> choose
>> >>> network boot installation, now i dont know which one is better and
>> errorless
>> >>> for me?!
>> >>> 4- I found something , when i want to find mac address of the client
>> node
>> >>> , ( i have just 2 node connected to switch as a pilot project , one of
>> them
>> >>> as a server and another one as a clinet ) wrong mac address found , i
>> think
>> >>> it is switch mac address that found , so i should insert client mac
>> address
>> >>> manually , do u think it can cause some errors in installation
>> process?!
>> >>> Best regards.
>> >>>
>> >>> On Fri, Oct 31, 2008 at 1:57 AM, <[EMAIL PROTECTED]> wrote:
>> >>>>
>> >>>> Have you checked the headnode, to make sure that your firewall is not
>> >>>> running?
>> >>>>
>> >>>> for a GUI to turn off the firewall: system-config-securitylevel
>> >>>>
>> >>>> On Tue, Oct 28, 2008 at 2:51 PM, ali nazemian <[EMAIL PROTECTED]
>> >
>> >>>> wrote:
>> >>>>>
>> >>>>> Hi.
>> >>>>> I want to execute clustering for our HPC center using OSCAR, but i
>> have
>> >>>>> a problem with step 7, installing cluster.
>> >>>>> Here is my problem :
>> >>>>> After i want to run step 7 , after some time on client node "tftp
>> time
>> >>>>> out" error appeared and node terminate the boot agent. and "Received
>> >>>>> disconnect from 192.168.0.2: 2: The connection is closed by SSH
>> Server
>> >>>>> Current FSM is SSH_Main_SSHProcess" appeared on server node.
>> >>>>> Here is the complete log of step 7:
>> >>>>>
>> >>>>>
>> --------------------------------------------------------------------------
>> >>>>> --> Update Wizard Env (as needed)
>> >>>>> --> Step 7: Running: ./post_install
>> >>>>> Gathering processor count from oscarnode1.clusternet.
>> >>>>> ssh: connect to host oscarnode1.clusternet port 22: Connection timed
>> >>>>> out
>> >>>>> Improper count (0) returned from machine oscarnode1.clusternet at
>> >>>>> ./post_install line 83
>> >>>>> main::get_numproc() called at ./post_install line 39
>> >>>>> ssh: connect to host oscarnode1 port 22: Connection timed out
>> >>>>> rsync: connection unexpectedly closed (0 bytes received so far)
>> >>>>> [sender]
>> >>>>> rsync error: error in rsync protocol data stream (code 12) at
>> io.c(359)
>> >>>>> --> About to run /opt/oscar/packages/loghost/scripts/post_install
>> for
>> >>>>> loghost
>> >>>>> ************************* oscar_cluster *************************
>> >>>>> --------- oscarnode1---------
>> >>>>> ssh: connect to host oscarnode1 port 22: Connection timed out
>> >>>>> --> About to run /opt/oscar/packages/ganglia/scripts/post_install
>> for
>> >>>>> ganglia
>> >>>>> [ganglia] Ganglia gmond configuration file modified, re-starting
>> >>>>> daemon...
>> >>>>> Shutting down GANGLIA gmond: [60G[ [0;32mOK [0;39m ]
>> >>>>> Starting GANGLIA gmond: [60G[ [0;32mOK [0;39m ]
>> >>>>> editing /etc/gmetad.conf
>> >>>>> match: gridname\s+.*
>> >>>>> match: data_source\s+.*
>> >>>>> [ganglia] Ganglia gmetad configuration file modified, re-starting
>> >>>>> daemon...
>> >>>>> Shutting down GANGLIA gmetad: [60G[ [0;32mOK [0;39m ]
>> >>>>> Starting GANGLIA gmetad: [60G[ [0;32mOK [0;39m ]
>> >>>>> [ganglia] Starting up apache...
>> >>>>> Stopping httpd: [60G[ [0;32mOK [0;39m ]
>> >>>>> Starting httpd: [60G[ [0;32mOK [0;39m ]
>> >>>>> [ganglia] Ganglia page is located at
>> http://server.clusternet/ganglia/
>> >>>>> ************************* oscar_cluster *************************
>> >>>>> --------- oscarnode1---------
>> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed
>> by
>> >>>>> SSH Server
>> >>>>> Current FSM is SSH_Main_SSHProcess
>> >>>>> --> About to run /opt/oscar/packages/torque/scripts/post_install for
>> >>>>> torque
>> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed
>> by
>> >>>>> SSH Server
>> >>>>> Current FSM is SSH_Main_SSHProcess
>> >>>>> rsync: connection unexpectedly closed (0 bytes received so far)
>> >>>>> [sender]
>> >>>>> rsync error: error in rsync protocol data stream (code 12) at
>> io.c(359)
>> >>>>> TORQUE mom config file updated with clienthost: server.clusternet
>> >>>>> Pushing config file to clients...
>> >>>>> Sending SIGHUP to all moms...
>> >>>>> ************************* oscar_cluster *************************
>> >>>>> --------- oscarnode1---------
>> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed
>> by
>> >>>>> SSH Server
>> >>>>> Current FSM is SSH_Main_SSHProcess
>> >>>>> [torque] Updating pbs_server nodes
>> >>>>> /opt/pbs/bin/pbsnodes: Server has no node list
>> >>>>> Shutting down TORQUE Server: [60G[ [0;32mOK [0;39m ]
>> >>>>> Starting TORQUE Server: [60G[ [0;32mOK [0;39m ]
>> >>>>> [torque] Creating TORQUE workq queue...
>> >>>>> Max open servers: 4
>> >>>>> set queue workq resources_max.ncpus = 0
>> >>>>> set queue workq resources_max.nodect = 0
>> >>>>> set queue workq resources_available.nodect = 0
>> >>>>> set server resources_available.ncpus = 0
>> >>>>> set server resources_available.nodect = 0
>> >>>>> set server resources_available.nodes = 0
>> >>>>> set server resources_max.ncpus = 0
>> >>>>> set server resources_max.nodes = 0
>> >>>>> set server scheduler_iteration = 60
>> >>>>> set server log_events = 64
>> >>>>> Shutting down MAUI Scheduler: [60G[ [0;32mOK [0;39m ]
>> >>>>> Starting MAUI Scheduler: [60G[ [0;32mOK [0;39m ]
>> >>>>> --> About to run /opt/oscar/packages/switcher/scripts/post_install
>> for
>> >>>>> switcher
>> >>>>> Setting default for tag mpi ("lam-7.1.2")
>> >>>>> Attribute successfully set; new attribute setting will be effective
>> for
>> >>>>> future shells
>> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed
>> by
>> >>>>> SSH Server
>> >>>>> Current FSM is SSH_Main_SSHProcess
>> >>>>> rsync: connection unexpectedly closed (0 bytes received so far)
>> >>>>> [sender]
>> >>>>> rsync error: error in rsync protocol data stream (code 12) at
>> io.c(359)
>> >>>>> --> About to run /opt/oscar/packages/mta-config/scripts/post_install
>> >>>>> for mta-config
>> >>>>> ************************************ WARNING
>> >>>>> ************************************
>> >>>>> OSCAR could not set up the configuration for any mailing service on
>> the
>> >>>>> server.
>> >>>>> The current version of the mta-config package in OSCAR only supports
>> >>>>> the Postfix mail transfer agent (MTA).
>> >>>>> It looks like you have another MTA installed (e.g, sendmail or
>> exim);
>> >>>>> as such,
>> >>>>> please be aware that OSCAR will not automatically configure it.
>> >>>>> ************************************ WARNING
>> >>>>> ************************************
>> >>>>> --> About to run /opt/oscar/packages/ntpconfig/scripts/post_install
>> for
>> >>>>> ntpconfig
>> >>>>> Shutting down ntpd: [60G[ [0;32mOK [0;39m ]
>> >>>>> Starting ntpd: [60G[ [0;32mOK [0;39m ]
>> >>>>> ************************* oscar_cluster *************************
>> >>>>> --------- oscarnode1---------
>> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed
>> by
>> >>>>> SSH Server
>> >>>>> Current FSM is SSH_Main_SSHProcess
>> >>>>> --> About to run /opt/oscar/packages/opium/scripts/post_install for
>> >>>>> opium
>> >>>>> Not all hosts were accessible by c3! Will retry the update later
>> >>>>> Could not find template for file switcher.ini
>> >>>>> If this contains distro-specific lines, please create a template!
>> >>>>> image:
>> >>>>> $VAR1 = 'oscarimage';
>> >>>>> ---------------
>> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed
>> by
>> >>>>> SSH Server
>> >>>>> Current FSM is SSH_Main_SSHProcess
>> >>>>> rsync: connection unexpectedly closed (0 bytes received so far)
>> >>>>> [sender]
>> >>>>> rsync error: error in rsync protocol data stream (code 12) at
>> io.c(359)
>> >>>>> Could not find template for file gshadow
>> >>>>> If this contains distro-specific lines, please create a template!
>> >>>>> image:
>> >>>>> $VAR1 = 'oscarimage';
>> >>>>> ---------------
>> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed
>> by
>> >>>>> SSH Server
>> >>>>> Current FSM is SSH_Main_SSHProcess
>> >>>>> rsync: connection unexpectedly closed (0 bytes received so far)
>> >>>>> [sender]
>> >>>>> rsync error: error in rsync protocol data stream (code 12) at
>> io.c(359)
>> >>>>> image:
>> >>>>> $VAR1 = 'oscarimage';
>> >>>>> ---------------
>> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed
>> by
>> >>>>> SSH Server
>> >>>>> Current FSM is SSH_Main_SSHProcess
>> >>>>> rsync: connection unexpectedly closed (0 bytes received so far)
>> >>>>> [sender]
>> >>>>> rsync error: error in rsync protocol data stream (code 12) at
>> io.c(359)
>> >>>>> image:
>> >>>>> $VAR1 = 'oscarimage';
>> >>>>> ---------------
>> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed
>> by
>> >>>>> SSH Server
>> >>>>> Current FSM is SSH_Main_SSHProcess
>> >>>>> rsync: connection unexpectedly closed (0 bytes received so far)
>> >>>>> [sender]
>> >>>>> rsync error: error in rsync protocol data stream (code 12) at
>> io.c(359)
>> >>>>> image:
>> >>>>> $VAR1 = 'oscarimage';
>> >>>>> ---------------
>> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed
>> by
>> >>>>> SSH Server
>> >>>>> Current FSM is SSH_Main_SSHProcess
>> >>>>> rsync: connection unexpectedly closed (0 bytes received so far)
>> >>>>> [sender]
>> >>>>> rsync error: error in rsync protocol data stream (code 12) at
>> io.c(359)
>> >>>>> --> About to run /opt/oscar/packages/oda/scripts/post_install for
>> oda
>> >>>>> generating the /etc/odaserver file on all oscar clients
>> >>>>> . /etc/profile.d/c3.sh && cexec 'echo oscar_server > /etc/odaserver'
>> >>>>> ************************* oscar_cluster *************************
>> >>>>> --------- oscarnode1---------
>> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed
>> by
>> >>>>> SSH Server
>> >>>>> Current FSM is SSH_Main_SSHProcess
>> >>>>> Cluster setup complete!
>> >>>>> --> Step 7: Successfully completed the cluster install
>> >>>>> --> Update Wizard Env (as needed)
>> >>>>>
>> >>>>>
>> -----------------------------------------------------------------------------------
>> >>>>> P.S: i am using OSCAR 5 on centos 4.7-x86_64 , i cant use centos 5.X
>> >>>>> because of it has problem with my graphic cards.
>> >>>>> Best regards.
>> >>>>>
>> >>>>> --
>> >>>>> A.Nazemian
>> >>>>>
>> >>>>>
>> >>>>>
>> -------------------------------------------------------------------------
>> >>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's
>> >>>>> challenge
>> >>>>> Build the coolest Linux based applications with Moblin SDK & win
>> great
>> >>>>> prizes
>> >>>>> Grand prize is a trip for two to an Open Source event anywhere in
>> the
>> >>>>> world
>> >>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>> >>>>> _______________________________________________
>> >>>>> Oscar-users mailing list
>> >>>>> Oscar-users@lists.sourceforge.net
>> >>>>> https://lists.sourceforge.net/lists/listinfo/oscar-users
>> >>>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> -------------------------------------------------------------------------
>> >>>> This SF.Net email is sponsored by the Moblin Your Move Developer's
>> >>>> challenge
>> >>>> Build the coolest Linux based applications with Moblin SDK & win
>> great
>> >>>> prizes
>> >>>> Grand prize is a trip for two to an Open Source event anywhere in the
>> >>>> world
>> >>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>> >>>> _______________________________________________
>> >>>> Oscar-users mailing list
>> >>>> Oscar-users@lists.sourceforge.net
>> >>>> https://lists.sourceforge.net/lists/listinfo/oscar-users
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> A.Nazemian
>> >>
>> >>
>> >>
>> >> --
>> >> A.Nazemian
>> >
>> >
>> >
>> > --
>> > A.Nazemian
>> >
>> >
>> -------------------------------------------------------------------------
>> > This SF.Net email is sponsored by the Moblin Your Move Developer's
>> challenge
>> > Build the coolest Linux based applications with Moblin SDK & win great
>> > prizes
>> > Grand prize is a trip for two to an Open Source event anywhere in the
>> world
>> > http://moblin-contest.org/redirect.php?banner_id=100&url=/
>> > _______________________________________________
>> > Oscar-users mailing list
>> > Oscar-users@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/oscar-users
>> >
>> >
>>
>> -------------------------------------------------------------------------
>> This SF.Net email is sponsored by the Moblin Your Move Developer's
>> challenge
>> Build the coolest Linux based applications with Moblin SDK & win great
>> prizes
>> Grand prize is a trip for two to an Open Source event anywhere in the
>> world
>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>> _______________________________________________
>> Oscar-users mailing list
>> Oscar-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/oscar-users
>>
>
>
>
> --
> A.Nazemian
>
--
A.Nazemian
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users