hd0 is just a label for one of the hard drives, that is fairly normal. For some reason the nodes aren't liking the drives that OSCAR picked for your nodes to use.
What you'll need to do is boot a rescue CD to a node and copy the modprobe.conf file it uses (assuming it boots properly and can mount the node's disk) to over write the one made by OSCAR Then use the tips here (http://svn.oscar.openclustergroup.org/trac/oscar/wiki/TipLDAP) to change the OSCAR files used when imaging the nodes for future nodes. Then depending on if you have a lot of nodes, you can reimage the nodes or just repeat the rescue CD booting. Fix the modprobe in the images though, in case you reimage them later. Generally, the way I work is to make changes to a node, note what changes I made that worked, update the image with those changes, then reimmage the cluster if possible. This depends on how many users you have of course :) On Sun, Nov 2, 2008 at 4:08 AM, Ali Nazemian <[EMAIL PROTECTED]> wrote: > Sorry i forgot to write something in my last post , so add these sentences > to that: > on client node boot menu i have 2 choice( it's obvious) : > 2.6.9-78.ELsmp_(hd0,0) > 2.6.9-78.EL_(hd0,0) > what hd0 means? i didnt saw something like this in Linux boot menu, anyway , > i tried to edit "2.6.9-78.ELsmp_(hd0,0)" command , and a new page with 3 > choices showed up: > root(hd0,0) > kernel /vmlinuz-2.6.9-78.ELsmp ro root=/dev/sda6 > initrd /sc-initrd-2.6.9-78.ELsmp.gz > i checked server node , and for server , i have a boot menu like this:( edit > command) > root(hd0,0) > kernel /vmlinuz-2.6.9-78.ELsmp ro root=/dev/VolGroup00/logvoI00 rhgb... > initrd /initrd-2.6.9-78.Elsmp.img > is this usuall ? or i am in trouble? > FYI: i used UYOK method. > cheers. > > On Sun, Nov 2, 2008 at 12:15 PM, Ali Nazemian <[EMAIL PROTECTED]> wrote: >> >> Hi again , >> I set image post action to "beep" and start installation process , after a >> while it seems that installation process on client node , finished , and its >> waiting for reboot , so i reset node client , after that i changed boot >> priority to hard disk first , then node client tried to boot from hard disk >> and loading centos on it , but these errors showed up: >> >> Mounted /proc filesystem >> Mounting sysfs >> Creating /dev >> Starting udev >> Loading jbd.ko module >> Loading ext3.ko module >> Creating root device >> Mounting root filesystem >> mount:error 6 mounting ext3 >> mount:error 2 mounting none >> Switching to new root >> switchroot:mount failed:22 >> unmount /initrd/dev failed:2 >> kernel panic - not syncing : Attempted to kill init! >> ------------------- >> on the server node , in monitoring page , still installation of this node >> is green and its progress is "beeping" , so its seems it was successful >> until now. >> what is the problem? what should i do now?! >> cheers. >> >> On Sat, Nov 1, 2008 at 11:41 PM, Ali Nazemian <[EMAIL PROTECTED]> >> wrote: >>> >>> I checked what u said , and it seems that firewall was enable , although >>> ssh was allowed , but that was enable , so this problem solved by disabling >>> firewall , after that some new errors showed up on client node ,it was >>> something about portioning problem in client node that i think it was >>> related to ide.disk/scsi.disk file , so i have questions about imaging >>> process in OSCAR installation, that probably can help me to install it >>> without any errors: >>> 1- in step "build the image" we should choose disk partion file , they >>> said we should choose scsi.disk for scsi disks and ide.disk for IDE disks , >>> but what about SATA IDE disks? as u know hda partion format for IDE and sda >>> is for scsi , i saw SATA is use sda too , so should i choose scsi.disk?! >>> 2- in this step , we should ip assignment method , dafult value for that >>> is static , which one should i choose?! static or dhcp?! which one is more >>> efficeient? i think static is better, what do u think?! >>> 3- post install action , should be reboot , beep or something else? in >>> istallation manual it says we shouldn't choose reboot if we want to choose >>> network boot installation, now i dont know which one is better and errorless >>> for me?! >>> 4- I found something , when i want to find mac address of the client node >>> , ( i have just 2 node connected to switch as a pilot project , one of them >>> as a server and another one as a clinet ) wrong mac address found , i think >>> it is switch mac address that found , so i should insert client mac address >>> manually , do u think it can cause some errors in installation process?! >>> Best regards. >>> >>> On Fri, Oct 31, 2008 at 1:57 AM, <[EMAIL PROTECTED]> wrote: >>>> >>>> Have you checked the headnode, to make sure that your firewall is not >>>> running? >>>> >>>> for a GUI to turn off the firewall: system-config-securitylevel >>>> >>>> On Tue, Oct 28, 2008 at 2:51 PM, ali nazemian <[EMAIL PROTECTED]> >>>> wrote: >>>>> >>>>> Hi. >>>>> I want to execute clustering for our HPC center using OSCAR, but i have >>>>> a problem with step 7, installing cluster. >>>>> Here is my problem : >>>>> After i want to run step 7 , after some time on client node "tftp time >>>>> out" error appeared and node terminate the boot agent. and "Received >>>>> disconnect from 192.168.0.2: 2: The connection is closed by SSH Server >>>>> Current FSM is SSH_Main_SSHProcess" appeared on server node. >>>>> Here is the complete log of step 7: >>>>> >>>>> -------------------------------------------------------------------------- >>>>> --> Update Wizard Env (as needed) >>>>> --> Step 7: Running: ./post_install >>>>> Gathering processor count from oscarnode1.clusternet. >>>>> ssh: connect to host oscarnode1.clusternet port 22: Connection timed >>>>> out >>>>> Improper count (0) returned from machine oscarnode1.clusternet at >>>>> ./post_install line 83 >>>>> main::get_numproc() called at ./post_install line 39 >>>>> ssh: connect to host oscarnode1 port 22: Connection timed out >>>>> rsync: connection unexpectedly closed (0 bytes received so far) >>>>> [sender] >>>>> rsync error: error in rsync protocol data stream (code 12) at io.c(359) >>>>> --> About to run /opt/oscar/packages/loghost/scripts/post_install for >>>>> loghost >>>>> ************************* oscar_cluster ************************* >>>>> --------- oscarnode1--------- >>>>> ssh: connect to host oscarnode1 port 22: Connection timed out >>>>> --> About to run /opt/oscar/packages/ganglia/scripts/post_install for >>>>> ganglia >>>>> [ganglia] Ganglia gmond configuration file modified, re-starting >>>>> daemon... >>>>> Shutting down GANGLIA gmond: [60G[ [0;32mOK [0;39m ] >>>>> Starting GANGLIA gmond: [60G[ [0;32mOK [0;39m ] >>>>> editing /etc/gmetad.conf >>>>> match: gridname\s+.* >>>>> match: data_source\s+.* >>>>> [ganglia] Ganglia gmetad configuration file modified, re-starting >>>>> daemon... >>>>> Shutting down GANGLIA gmetad: [60G[ [0;32mOK [0;39m ] >>>>> Starting GANGLIA gmetad: [60G[ [0;32mOK [0;39m ] >>>>> [ganglia] Starting up apache... >>>>> Stopping httpd: [60G[ [0;32mOK [0;39m ] >>>>> Starting httpd: [60G[ [0;32mOK [0;39m ] >>>>> [ganglia] Ganglia page is located at http://server.clusternet/ganglia/ >>>>> ************************* oscar_cluster ************************* >>>>> --------- oscarnode1--------- >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed by >>>>> SSH Server >>>>> Current FSM is SSH_Main_SSHProcess >>>>> --> About to run /opt/oscar/packages/torque/scripts/post_install for >>>>> torque >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed by >>>>> SSH Server >>>>> Current FSM is SSH_Main_SSHProcess >>>>> rsync: connection unexpectedly closed (0 bytes received so far) >>>>> [sender] >>>>> rsync error: error in rsync protocol data stream (code 12) at io.c(359) >>>>> TORQUE mom config file updated with clienthost: server.clusternet >>>>> Pushing config file to clients... >>>>> Sending SIGHUP to all moms... >>>>> ************************* oscar_cluster ************************* >>>>> --------- oscarnode1--------- >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed by >>>>> SSH Server >>>>> Current FSM is SSH_Main_SSHProcess >>>>> [torque] Updating pbs_server nodes >>>>> /opt/pbs/bin/pbsnodes: Server has no node list >>>>> Shutting down TORQUE Server: [60G[ [0;32mOK [0;39m ] >>>>> Starting TORQUE Server: [60G[ [0;32mOK [0;39m ] >>>>> [torque] Creating TORQUE workq queue... >>>>> Max open servers: 4 >>>>> set queue workq resources_max.ncpus = 0 >>>>> set queue workq resources_max.nodect = 0 >>>>> set queue workq resources_available.nodect = 0 >>>>> set server resources_available.ncpus = 0 >>>>> set server resources_available.nodect = 0 >>>>> set server resources_available.nodes = 0 >>>>> set server resources_max.ncpus = 0 >>>>> set server resources_max.nodes = 0 >>>>> set server scheduler_iteration = 60 >>>>> set server log_events = 64 >>>>> Shutting down MAUI Scheduler: [60G[ [0;32mOK [0;39m ] >>>>> Starting MAUI Scheduler: [60G[ [0;32mOK [0;39m ] >>>>> --> About to run /opt/oscar/packages/switcher/scripts/post_install for >>>>> switcher >>>>> Setting default for tag mpi ("lam-7.1.2") >>>>> Attribute successfully set; new attribute setting will be effective for >>>>> future shells >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed by >>>>> SSH Server >>>>> Current FSM is SSH_Main_SSHProcess >>>>> rsync: connection unexpectedly closed (0 bytes received so far) >>>>> [sender] >>>>> rsync error: error in rsync protocol data stream (code 12) at io.c(359) >>>>> --> About to run /opt/oscar/packages/mta-config/scripts/post_install >>>>> for mta-config >>>>> ************************************ WARNING >>>>> ************************************ >>>>> OSCAR could not set up the configuration for any mailing service on the >>>>> server. >>>>> The current version of the mta-config package in OSCAR only supports >>>>> the Postfix mail transfer agent (MTA). >>>>> It looks like you have another MTA installed (e.g, sendmail or exim); >>>>> as such, >>>>> please be aware that OSCAR will not automatically configure it. >>>>> ************************************ WARNING >>>>> ************************************ >>>>> --> About to run /opt/oscar/packages/ntpconfig/scripts/post_install for >>>>> ntpconfig >>>>> Shutting down ntpd: [60G[ [0;32mOK [0;39m ] >>>>> Starting ntpd: [60G[ [0;32mOK [0;39m ] >>>>> ************************* oscar_cluster ************************* >>>>> --------- oscarnode1--------- >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed by >>>>> SSH Server >>>>> Current FSM is SSH_Main_SSHProcess >>>>> --> About to run /opt/oscar/packages/opium/scripts/post_install for >>>>> opium >>>>> Not all hosts were accessible by c3! Will retry the update later >>>>> Could not find template for file switcher.ini >>>>> If this contains distro-specific lines, please create a template! >>>>> image: >>>>> $VAR1 = 'oscarimage'; >>>>> --------------- >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed by >>>>> SSH Server >>>>> Current FSM is SSH_Main_SSHProcess >>>>> rsync: connection unexpectedly closed (0 bytes received so far) >>>>> [sender] >>>>> rsync error: error in rsync protocol data stream (code 12) at io.c(359) >>>>> Could not find template for file gshadow >>>>> If this contains distro-specific lines, please create a template! >>>>> image: >>>>> $VAR1 = 'oscarimage'; >>>>> --------------- >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed by >>>>> SSH Server >>>>> Current FSM is SSH_Main_SSHProcess >>>>> rsync: connection unexpectedly closed (0 bytes received so far) >>>>> [sender] >>>>> rsync error: error in rsync protocol data stream (code 12) at io.c(359) >>>>> image: >>>>> $VAR1 = 'oscarimage'; >>>>> --------------- >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed by >>>>> SSH Server >>>>> Current FSM is SSH_Main_SSHProcess >>>>> rsync: connection unexpectedly closed (0 bytes received so far) >>>>> [sender] >>>>> rsync error: error in rsync protocol data stream (code 12) at io.c(359) >>>>> image: >>>>> $VAR1 = 'oscarimage'; >>>>> --------------- >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed by >>>>> SSH Server >>>>> Current FSM is SSH_Main_SSHProcess >>>>> rsync: connection unexpectedly closed (0 bytes received so far) >>>>> [sender] >>>>> rsync error: error in rsync protocol data stream (code 12) at io.c(359) >>>>> image: >>>>> $VAR1 = 'oscarimage'; >>>>> --------------- >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed by >>>>> SSH Server >>>>> Current FSM is SSH_Main_SSHProcess >>>>> rsync: connection unexpectedly closed (0 bytes received so far) >>>>> [sender] >>>>> rsync error: error in rsync protocol data stream (code 12) at io.c(359) >>>>> --> About to run /opt/oscar/packages/oda/scripts/post_install for oda >>>>> generating the /etc/odaserver file on all oscar clients >>>>> . /etc/profile.d/c3.sh && cexec 'echo oscar_server > /etc/odaserver' >>>>> ************************* oscar_cluster ************************* >>>>> --------- oscarnode1--------- >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed by >>>>> SSH Server >>>>> Current FSM is SSH_Main_SSHProcess >>>>> Cluster setup complete! >>>>> --> Step 7: Successfully completed the cluster install >>>>> --> Update Wizard Env (as needed) >>>>> >>>>> ----------------------------------------------------------------------------------- >>>>> P.S: i am using OSCAR 5 on centos 4.7-x86_64 , i cant use centos 5.X >>>>> because of it has problem with my graphic cards. >>>>> Best regards. >>>>> >>>>> -- >>>>> A.Nazemian >>>>> >>>>> >>>>> ------------------------------------------------------------------------- >>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>>>> challenge >>>>> Build the coolest Linux based applications with Moblin SDK & win great >>>>> prizes >>>>> Grand prize is a trip for two to an Open Source event anywhere in the >>>>> world >>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>> _______________________________________________ >>>>> Oscar-users mailing list >>>>> Oscar-users@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/oscar-users >>>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------- >>>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>>> challenge >>>> Build the coolest Linux based applications with Moblin SDK & win great >>>> prizes >>>> Grand prize is a trip for two to an Open Source event anywhere in the >>>> world >>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>> _______________________________________________ >>>> Oscar-users mailing list >>>> Oscar-users@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/oscar-users >>>> >>> >>> >>> >>> -- >>> A.Nazemian >> >> >> >> -- >> A.Nazemian > > > > -- > A.Nazemian > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Oscar-users mailing list > Oscar-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/oscar-users > > ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-users