something else , for head node partitioning i choose automatic partitioning
, so partition sizes on head node and client node is different.

On Tue, Nov 4, 2008 at 12:17 PM, Ali Nazemian <[EMAIL PROTECTED]> wrote:

> Hi again , i did what u said:
> 1- boot centos 4.7 install cd and choose linux rescue mode.
> 2- copy modprobe.conf from /tmp folder ( mounted by rescue mode) to /etc
> but nothing happend , the kernel panic error still exist.
> i checked modprobe.conf from rescue , and it was like this:
> alias eth0 e1000
> alias eth1 e1000
> alias scsi_hostadapter ata_piix
>
> I also copied modeprobe.conf file from head node , to /etc of client node ,
> but problem still exist , here is the head node modprobe.conf file:
> alias eth0 e1000
> alias eth1 e1000
> alias scsi_hostadapter ata_piix
> alias snd-card-0 snd-hda-intel
> options snd-card-0 index=0
> install snd-hda-intel /sbin/modprobe --ignore-install snd-hda-intel &&
> /usr/sbin/alsactl restore >/dev/null 2>&1 || :
> remove snd-hda-intel { /usr/sbin/alsactl store >/dev/null 2>&1 || : ; };
> /sbin/modprobe -r --ignore-remove snd-hda-intel
> alias usb-controller ehci-hcd
> alias usb-controller1 uhci-hcd
> Are u sure my problem is because of modprobe.conf file?
> FYI: I had 3 SATA hard drive on head node and client node, i just install
> centos 4.7 on first hard drive , second and third one are unallocated.
>
> Thanks for ur help.
>
>
> On Tue, Nov 4, 2008 at 2:08 AM, Ali Nazemian <[EMAIL PROTECTED]>wrote:
>
>> Since we have just 22 node , so i think rescue disk would be reasonable.
>> I'll do that.
>> Also i want to install some software on nodes , such as Fluent , Ansys and
>> etc... how can i put these packages in image file instead of installing them
>> on nodes one by one?!
>> Cheers.
>>
>>
>> On Tue, Nov 4, 2008 at 1:05 AM, Michael Edwards <[EMAIL PROTECTED]>wrote:
>>
>>> hd0 is just a label for one of the hard drives, that is fairly normal.
>>>  For some reason the nodes aren't liking the drives that OSCAR picked
>>> for your nodes to use.
>>>
>>> What you'll need to do is boot a rescue CD to a node and copy the
>>> modprobe.conf file it uses (assuming it boots properly and can mount
>>> the node's disk) to over write the one made by OSCAR
>>>
>>> Then use the tips here
>>> (http://svn.oscar.openclustergroup.org/trac/oscar/wiki/TipLDAP) to
>>> change the OSCAR files used when imaging the nodes for future nodes.
>>>
>>> Then depending on if you have a lot of nodes, you can reimage the
>>> nodes or just repeat the rescue CD booting.  Fix the modprobe in the
>>> images though, in case you reimage them later.
>>>
>>> Generally, the way I work is to make changes to a node, note what
>>> changes I made that worked, update the image with those changes, then
>>> reimmage the cluster if possible.  This depends on how many users you
>>> have of course :)
>>>
>>> On Sun, Nov 2, 2008 at 4:08 AM, Ali Nazemian <[EMAIL PROTECTED]>
>>> wrote:
>>> > Sorry i forgot to write something in my last post , so add these
>>> sentences
>>> > to that:
>>> > on client node boot menu i have 2 choice( it's obvious) :
>>> > 2.6.9-78.ELsmp_(hd0,0)
>>> > 2.6.9-78.EL_(hd0,0)
>>> > what hd0 means? i didnt saw something like this in Linux boot menu,
>>> anyway ,
>>> > i tried to edit "2.6.9-78.ELsmp_(hd0,0)" command , and a new page with
>>> 3
>>> > choices showed up:
>>> > root(hd0,0)
>>> > kernel /vmlinuz-2.6.9-78.ELsmp ro root=/dev/sda6
>>> > initrd /sc-initrd-2.6.9-78.ELsmp.gz
>>> > i checked server node , and for server , i have a boot menu like this:(
>>> edit
>>> > command)
>>> > root(hd0,0)
>>> > kernel /vmlinuz-2.6.9-78.ELsmp ro root=/dev/VolGroup00/logvoI00 rhgb...
>>> > initrd /initrd-2.6.9-78.Elsmp.img
>>> > is this usuall ? or i am in trouble?
>>> > FYI: i used UYOK method.
>>> > cheers.
>>> >
>>> > On Sun, Nov 2, 2008 at 12:15 PM, Ali Nazemian <[EMAIL PROTECTED]>
>>> wrote:
>>> >>
>>> >> Hi again ,
>>> >> I set image post action to "beep" and start installation process ,
>>> after a
>>> >> while it seems that installation process on client node , finished ,
>>> and its
>>> >> waiting for reboot , so i reset node client , after that i changed
>>> boot
>>> >> priority to hard disk first , then node client tried to boot from hard
>>> disk
>>> >> and loading centos on it , but these errors showed up:
>>> >>
>>> >> Mounted /proc filesystem
>>> >> Mounting sysfs
>>> >> Creating /dev
>>> >> Starting udev
>>> >> Loading jbd.ko module
>>> >> Loading ext3.ko module
>>> >> Creating root device
>>> >> Mounting root filesystem
>>> >> mount:error 6 mounting ext3
>>> >> mount:error 2 mounting none
>>> >> Switching to new root
>>> >> switchroot:mount failed:22
>>> >> unmount /initrd/dev failed:2
>>> >> kernel panic - not syncing : Attempted to kill init!
>>> >> -------------------
>>> >> on the server node , in monitoring page , still installation of this
>>> node
>>> >> is green and its progress is "beeping" , so its seems it was
>>> successful
>>> >> until now.
>>> >> what is the problem? what should i do now?!
>>> >> cheers.
>>> >>
>>> >> On Sat, Nov 1, 2008 at 11:41 PM, Ali Nazemian <[EMAIL PROTECTED]>
>>> >> wrote:
>>> >>>
>>> >>> I checked what u said , and it seems that firewall was enable ,
>>> although
>>> >>> ssh was allowed , but that was enable , so this problem solved by
>>> disabling
>>> >>> firewall , after that some new errors showed up on client node ,it
>>> was
>>> >>> something about portioning problem in client node that i think it was
>>> >>> related to ide.disk/scsi.disk file ,  so i have questions about
>>> imaging
>>> >>> process in OSCAR installation, that probably can help me to install
>>> it
>>> >>> without any errors:
>>> >>> 1- in step "build the image" we should choose disk partion file ,
>>> they
>>> >>> said we should choose scsi.disk for scsi disks and ide.disk for IDE
>>> disks ,
>>> >>> but what about SATA IDE disks? as u know hda partion format for IDE
>>> and sda
>>> >>> is for scsi , i saw SATA is use sda too , so should i choose
>>> scsi.disk?!
>>> >>> 2- in this step , we should ip assignment method , dafult value for
>>> that
>>> >>> is static , which one should i choose?! static or dhcp?! which one is
>>> more
>>> >>> efficeient? i think static is better, what do u think?!
>>> >>> 3- post install action , should be reboot , beep or something else?
>>> in
>>> >>> istallation manual it says we shouldn't choose reboot if we want to
>>> choose
>>> >>> network boot installation, now i dont know which one is better and
>>> errorless
>>> >>> for me?!
>>> >>> 4- I found something , when i want to find mac address of the client
>>> node
>>> >>> , ( i have just 2 node connected to switch as a pilot project , one
>>> of them
>>> >>> as a server and another one as a clinet ) wrong mac address found , i
>>> think
>>> >>> it is switch mac address that found , so i should insert client mac
>>> address
>>> >>> manually , do u think it can cause some errors in installation
>>> process?!
>>> >>> Best regards.
>>> >>>
>>> >>> On Fri, Oct 31, 2008 at 1:57 AM, <[EMAIL PROTECTED]> wrote:
>>> >>>>
>>> >>>> Have you checked the headnode, to make sure that your firewall is
>>> not
>>> >>>> running?
>>> >>>>
>>> >>>> for a GUI to turn off the firewall: system-config-securitylevel
>>> >>>>
>>> >>>> On Tue, Oct 28, 2008 at 2:51 PM, ali nazemian <
>>> [EMAIL PROTECTED]>
>>> >>>> wrote:
>>> >>>>>
>>> >>>>> Hi.
>>> >>>>> I want to execute clustering for our HPC center using OSCAR, but i
>>> have
>>> >>>>> a problem with step 7, installing cluster.
>>> >>>>> Here is my problem :
>>> >>>>> After i want to run step 7 , after some time on client node "tftp
>>> time
>>> >>>>> out" error appeared and node terminate the boot agent. and
>>> "Received
>>> >>>>> disconnect from 192.168.0.2: 2: The connection is closed by SSH
>>> Server
>>> >>>>> Current FSM is SSH_Main_SSHProcess" appeared on server node.
>>> >>>>> Here is the complete log of step 7:
>>> >>>>>
>>> >>>>>
>>> --------------------------------------------------------------------------
>>> >>>>> --> Update Wizard Env (as needed)
>>> >>>>> --> Step 7: Running: ./post_install
>>> >>>>> Gathering processor count from oscarnode1.clusternet.
>>> >>>>> ssh: connect to host oscarnode1.clusternet port 22: Connection
>>> timed
>>> >>>>> out
>>> >>>>> Improper count (0) returned from machine oscarnode1.clusternet at
>>> >>>>> ./post_install line 83
>>> >>>>>     main::get_numproc() called at ./post_install line 39
>>> >>>>> ssh: connect to host oscarnode1 port 22: Connection timed out
>>> >>>>> rsync: connection unexpectedly closed (0 bytes received so far)
>>> >>>>> [sender]
>>> >>>>> rsync error: error in rsync protocol data stream (code 12) at
>>> io.c(359)
>>> >>>>> --> About to run /opt/oscar/packages/loghost/scripts/post_install
>>> for
>>> >>>>> loghost
>>> >>>>> ************************* oscar_cluster *************************
>>> >>>>> --------- oscarnode1---------
>>> >>>>> ssh: connect to host oscarnode1 port 22: Connection timed out
>>> >>>>> --> About to run /opt/oscar/packages/ganglia/scripts/post_install
>>> for
>>> >>>>> ganglia
>>> >>>>> [ganglia] Ganglia gmond configuration file modified, re-starting
>>> >>>>> daemon...
>>> >>>>> Shutting down GANGLIA gmond: [60G[  [0;32mOK [0;39m  ]
>>> >>>>> Starting GANGLIA gmond: [60G[  [0;32mOK [0;39m  ]
>>> >>>>> editing /etc/gmetad.conf
>>> >>>>> match: gridname\s+.*
>>> >>>>> match: data_source\s+.*
>>> >>>>> [ganglia] Ganglia gmetad configuration file modified, re-starting
>>> >>>>> daemon...
>>> >>>>> Shutting down GANGLIA gmetad: [60G[  [0;32mOK [0;39m  ]
>>> >>>>> Starting GANGLIA gmetad: [60G[  [0;32mOK [0;39m  ]
>>> >>>>> [ganglia] Starting up apache...
>>> >>>>> Stopping httpd: [60G[  [0;32mOK [0;39m  ]
>>> >>>>> Starting httpd: [60G[  [0;32mOK [0;39m  ]
>>> >>>>> [ganglia] Ganglia page is located at
>>> http://server.clusternet/ganglia/
>>> >>>>> ************************* oscar_cluster *************************
>>> >>>>> --------- oscarnode1---------
>>> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed
>>> by
>>> >>>>> SSH Server
>>> >>>>> Current FSM is SSH_Main_SSHProcess
>>> >>>>> --> About to run /opt/oscar/packages/torque/scripts/post_install
>>> for
>>> >>>>> torque
>>> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed
>>> by
>>> >>>>> SSH Server
>>> >>>>> Current FSM is SSH_Main_SSHProcess
>>> >>>>> rsync: connection unexpectedly closed (0 bytes received so far)
>>> >>>>> [sender]
>>> >>>>> rsync error: error in rsync protocol data stream (code 12) at
>>> io.c(359)
>>> >>>>> TORQUE mom config file updated with clienthost: server.clusternet
>>> >>>>> Pushing config file to clients...
>>> >>>>> Sending SIGHUP to all moms...
>>> >>>>> ************************* oscar_cluster *************************
>>> >>>>> --------- oscarnode1---------
>>> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed
>>> by
>>> >>>>> SSH Server
>>> >>>>> Current FSM is SSH_Main_SSHProcess
>>> >>>>> [torque] Updating pbs_server nodes
>>> >>>>> /opt/pbs/bin/pbsnodes: Server has no node list
>>> >>>>> Shutting down TORQUE Server: [60G[  [0;32mOK [0;39m  ]
>>> >>>>> Starting TORQUE Server: [60G[  [0;32mOK [0;39m  ]
>>> >>>>> [torque] Creating TORQUE workq queue...
>>> >>>>> Max open servers: 4
>>> >>>>> set queue workq resources_max.ncpus = 0
>>> >>>>> set queue workq resources_max.nodect = 0
>>> >>>>> set queue workq resources_available.nodect = 0
>>> >>>>> set server resources_available.ncpus = 0
>>> >>>>> set server resources_available.nodect = 0
>>> >>>>> set server resources_available.nodes = 0
>>> >>>>> set server resources_max.ncpus = 0
>>> >>>>> set server resources_max.nodes = 0
>>> >>>>> set server scheduler_iteration = 60
>>> >>>>> set server log_events = 64
>>> >>>>> Shutting down MAUI Scheduler: [60G[  [0;32mOK [0;39m  ]
>>> >>>>> Starting MAUI Scheduler: [60G[  [0;32mOK [0;39m  ]
>>> >>>>> --> About to run /opt/oscar/packages/switcher/scripts/post_install
>>> for
>>> >>>>> switcher
>>> >>>>> Setting default for tag mpi ("lam-7.1.2")
>>> >>>>> Attribute successfully set; new attribute setting will be effective
>>> for
>>> >>>>> future shells
>>> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed
>>> by
>>> >>>>> SSH Server
>>> >>>>> Current FSM is SSH_Main_SSHProcess
>>> >>>>> rsync: connection unexpectedly closed (0 bytes received so far)
>>> >>>>> [sender]
>>> >>>>> rsync error: error in rsync protocol data stream (code 12) at
>>> io.c(359)
>>> >>>>> --> About to run
>>> /opt/oscar/packages/mta-config/scripts/post_install
>>> >>>>> for mta-config
>>> >>>>> ************************************ WARNING
>>> >>>>> ************************************
>>> >>>>> OSCAR could not set up the configuration for any mailing service on
>>> the
>>> >>>>> server.
>>> >>>>> The current version of the mta-config package in OSCAR only
>>> supports
>>> >>>>> the Postfix mail transfer agent (MTA).
>>> >>>>> It looks like you have another MTA installed (e.g, sendmail or
>>> exim);
>>> >>>>> as such,
>>> >>>>> please be aware that OSCAR will not automatically configure it.
>>> >>>>> ************************************ WARNING
>>> >>>>> ************************************
>>> >>>>> --> About to run /opt/oscar/packages/ntpconfig/scripts/post_install
>>> for
>>> >>>>> ntpconfig
>>> >>>>> Shutting down ntpd: [60G[  [0;32mOK [0;39m  ]
>>> >>>>> Starting ntpd: [60G[  [0;32mOK [0;39m  ]
>>> >>>>> ************************* oscar_cluster *************************
>>> >>>>> --------- oscarnode1---------
>>> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed
>>> by
>>> >>>>> SSH Server
>>> >>>>> Current FSM is SSH_Main_SSHProcess
>>> >>>>> --> About to run /opt/oscar/packages/opium/scripts/post_install for
>>> >>>>> opium
>>> >>>>> Not all hosts were accessible by c3! Will retry the update later
>>> >>>>> Could not find template for file switcher.ini
>>> >>>>> If this contains distro-specific lines, please create a template!
>>> >>>>> image:
>>> >>>>> $VAR1 = 'oscarimage';
>>> >>>>> ---------------
>>> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed
>>> by
>>> >>>>> SSH Server
>>> >>>>> Current FSM is SSH_Main_SSHProcess
>>> >>>>> rsync: connection unexpectedly closed (0 bytes received so far)
>>> >>>>> [sender]
>>> >>>>> rsync error: error in rsync protocol data stream (code 12) at
>>> io.c(359)
>>> >>>>> Could not find template for file gshadow
>>> >>>>> If this contains distro-specific lines, please create a template!
>>> >>>>> image:
>>> >>>>> $VAR1 = 'oscarimage';
>>> >>>>> ---------------
>>> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed
>>> by
>>> >>>>> SSH Server
>>> >>>>> Current FSM is SSH_Main_SSHProcess
>>> >>>>> rsync: connection unexpectedly closed (0 bytes received so far)
>>> >>>>> [sender]
>>> >>>>> rsync error: error in rsync protocol data stream (code 12) at
>>> io.c(359)
>>> >>>>> image:
>>> >>>>> $VAR1 = 'oscarimage';
>>> >>>>> ---------------
>>> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed
>>> by
>>> >>>>> SSH Server
>>> >>>>> Current FSM is SSH_Main_SSHProcess
>>> >>>>> rsync: connection unexpectedly closed (0 bytes received so far)
>>> >>>>> [sender]
>>> >>>>> rsync error: error in rsync protocol data stream (code 12) at
>>> io.c(359)
>>> >>>>> image:
>>> >>>>> $VAR1 = 'oscarimage';
>>> >>>>> ---------------
>>> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed
>>> by
>>> >>>>> SSH Server
>>> >>>>> Current FSM is SSH_Main_SSHProcess
>>> >>>>> rsync: connection unexpectedly closed (0 bytes received so far)
>>> >>>>> [sender]
>>> >>>>> rsync error: error in rsync protocol data stream (code 12) at
>>> io.c(359)
>>> >>>>> image:
>>> >>>>> $VAR1 = 'oscarimage';
>>> >>>>> ---------------
>>> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed
>>> by
>>> >>>>> SSH Server
>>> >>>>> Current FSM is SSH_Main_SSHProcess
>>> >>>>> rsync: connection unexpectedly closed (0 bytes received so far)
>>> >>>>> [sender]
>>> >>>>> rsync error: error in rsync protocol data stream (code 12) at
>>> io.c(359)
>>> >>>>> --> About to run /opt/oscar/packages/oda/scripts/post_install for
>>> oda
>>> >>>>> generating the /etc/odaserver file on all oscar clients
>>> >>>>> . /etc/profile.d/c3.sh && cexec 'echo oscar_server >
>>> /etc/odaserver'
>>> >>>>> ************************* oscar_cluster *************************
>>> >>>>> --------- oscarnode1---------
>>> >>>>> Received disconnect from 192.168.0.2: 2: The connection is closed
>>> by
>>> >>>>> SSH Server
>>> >>>>> Current FSM is SSH_Main_SSHProcess
>>> >>>>> Cluster setup complete!
>>> >>>>> --> Step 7: Successfully completed the cluster install
>>> >>>>> --> Update Wizard Env (as needed)
>>> >>>>>
>>> >>>>>
>>> -----------------------------------------------------------------------------------
>>> >>>>> P.S: i am using OSCAR 5 on centos 4.7-x86_64 , i cant use centos
>>> 5.X
>>> >>>>> because of it has problem with my graphic cards.
>>> >>>>> Best regards.
>>> >>>>>
>>> >>>>> --
>>> >>>>> A.Nazemian
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> -------------------------------------------------------------------------
>>> >>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's
>>> >>>>> challenge
>>> >>>>> Build the coolest Linux based applications with Moblin SDK & win
>>> great
>>> >>>>> prizes
>>> >>>>> Grand prize is a trip for two to an Open Source event anywhere in
>>> the
>>> >>>>> world
>>> >>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>>> >>>>> _______________________________________________
>>> >>>>> Oscar-users mailing list
>>> >>>>> Oscar-users@lists.sourceforge.net
>>> >>>>> https://lists.sourceforge.net/lists/listinfo/oscar-users
>>> >>>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> -------------------------------------------------------------------------
>>> >>>> This SF.Net email is sponsored by the Moblin Your Move Developer's
>>> >>>> challenge
>>> >>>> Build the coolest Linux based applications with Moblin SDK & win
>>> great
>>> >>>> prizes
>>> >>>> Grand prize is a trip for two to an Open Source event anywhere in
>>> the
>>> >>>> world
>>> >>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>>> >>>> _______________________________________________
>>> >>>> Oscar-users mailing list
>>> >>>> Oscar-users@lists.sourceforge.net
>>> >>>> https://lists.sourceforge.net/lists/listinfo/oscar-users
>>> >>>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> A.Nazemian
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> A.Nazemian
>>> >
>>> >
>>> >
>>> > --
>>> > A.Nazemian
>>> >
>>> >
>>> -------------------------------------------------------------------------
>>> > This SF.Net email is sponsored by the Moblin Your Move Developer's
>>> challenge
>>> > Build the coolest Linux based applications with Moblin SDK & win great
>>> > prizes
>>> > Grand prize is a trip for two to an Open Source event anywhere in the
>>> world
>>> > http://moblin-contest.org/redirect.php?banner_id=100&url=/
>>> > _______________________________________________
>>> > Oscar-users mailing list
>>> > Oscar-users@lists.sourceforge.net
>>> > https://lists.sourceforge.net/lists/listinfo/oscar-users
>>> >
>>> >
>>>
>>> -------------------------------------------------------------------------
>>> This SF.Net email is sponsored by the Moblin Your Move Developer's
>>> challenge
>>> Build the coolest Linux based applications with Moblin SDK & win great
>>> prizes
>>> Grand prize is a trip for two to an Open Source event anywhere in the
>>> world
>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>>> _______________________________________________
>>> Oscar-users mailing list
>>> Oscar-users@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/oscar-users
>>>
>>
>>
>>
>> --
>> A.Nazemian
>>
>
>
>
> --
> A.Nazemian
>



-- 
A.Nazemian
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to