Thanks, I think it is finally gone!

As it turns out I needed to install on the server, client, and images.  I must have 
"completed" the installation with the bugged version or interupted the process before 
ODA could be updated at some point.  Its a good thing chroot is not nearly as 
complicated as I thought it was.

I had neglected to install the client and gui packages on the server, and also had 
some occasional burps due to bad X11 forwarding.  Remarked out the 'ForwardX11 yes' 
line in /etc/ssh/ssh.conf and they went away.

Original Message -----------------------
If you started a completely fresh install, then yes you would need to run
the script again.  Otherwise, if the oscar server was left intact, it's not
necessary.  It's re-runnable anyway, so no risk there.  Also, if you start
a fresh install, it would be best to just skip installing openpbs
altogether and get the OPD versions of clumon, torque, and Maui all
installed together.

         Jeremy

At 10:09 AM 6/3/2004, Michael Edwards wrote:
>Ok, I don't think the chroot worked at all (I hope, if it did something
>and didn't finish I think I will just call it a loss and start over) so I
>installed what I thought was the expected packages on the server and
>nodes.  Heres the score after I figured out how cpush worked finally.
>
>[EMAIL PROTECTED] RPMS]# rpm -qa | grep openpbs
>openpbs-oscar-mom-2.3.16-11
>openpbs-oscar-server-2.3.16-11
>openpbs-oscar-2.3.16-11
>
>--------- oscarnode001.brooks.af.mil---------
>openpbs-oscar-2.3.16-11
>openpbs-oscar-mom-2.3.16-11
>openpbs-oscar-client-2.3.16-11
>
>So now should I run the fix script again before I try to uninstall?
>
>Original Message -----------------------
>Arrgh.  I had that problem once too, and I don't remember what it
>was.  There are several ways to limp it along, depending on what
>happened.  The uninstallation of an OSCAR package has 3 main phases:
>uninstall from image
>uninstall from clients
>uninstall from server
>
>The same uninstall script that is run on the clients is run chrooted in the
>image.  If one of these phases fails, the uninstall fails.  However, that
>doesn't mean that *none* of the phases worked.  i.e.  The image uninstall
>may have succeeded, and the others may have failed... choose your
>combination.  This is where it gets kind of hairy.  Even if the problem
>which caused the failures gets fixed, an uninstall attempt on a phase which
>has already been successfully uninstalled will result in a failure
>overall.  This will happen perpetually when the phases get out of sync with
>each other.
>So... at that point, you have two options.
>1)  Manually reinstall the rpms on the server, image, or clients...
>whichever one apparently succeeded in removal.  That will sync things up so
>you can attempt to remove it again.
>2)  Manually complete the uninstallation in the phases where they remain...
>and then correct the state of the database to reflect that they have been
>uninstalled.  I would supply the ODA commands to accomplish this, but I
>don't know them off hand.  Perhaps Neil Gorsuch (ODA author) or John Mugler
>(uninstall author) can supply the additional information here... i.e. ODA
>commands necessary to manually change a given package's database state so
>that it's no longer installed.
>
>Oh... and to find out which phase is giving you the trouble w/ uninstall...
>look at the early output in the terminal after you hit "Execute".  The
>action that takes place after hitting "Cancel" is a known bug, btw.
>
>          Jeremy
>
>At 03:22 PM 5/28/2004, Michael Edwards wrote:
> >Ok, heres what I did (it didn't work I don't think) after I ran the script
> >you sent me as root.  The script appeared to work.
> >
> >First I downloaded maui and torque with opd.  At least I think I did, but
> >they didn't show up on the Install/Uninstall button like ganglia
> >did.  Could be a proxy issue, but I got the listing from opd so I don't
> >think it should be.
> >
> >Anyway, then I uninstalled maui, which appeared to work fine.  Then I
> >tried to uninstall pbs but it froze up during a chroot to run the post
> >uninstall script on the image.  I let it sit for like 20 min and then
> >control+c to kill the process and tried again.  The maui box was unclicked
> >and black (indicating it was done I assume) and the pbs box was unclicked
> >but red (indicating it hadn't actually uninstalled yet).  Torque still
> >didn't show up on the list at all.  If I either executed or canceled it
> >appeared to run the same serries of post install scripts.  This error
> >showed up a lot...
> >
> >Use of uninitialized value in pattern match (m//) at
> >/usr/lib/perl5/site_perl/oda.pm line 2930.
> >
> >Anyway worse comes to worse I will just backup my files and start over,
> >but I was hoping you might know what happened.
> >
> >Original Message -----------------------
> >Sorry- upon testing, it appears that the CluMon rpm also has a PBS
> >dependency, and would also need to be
> >uninstalled/re-installed.  Uninstallations of packages must occur
> >seperately, but I believe multiple packages can be added at once.
> >
> >          Jeremy
> >
> >At 05:38 PM 5/25/2004, Jeremy Enos wrote:
> > >Ok... so instead of writing down which changes to make, I decided just to
> > >script it to make it simpler for everyone.  Just run the attached script
> > >as root on your head node (it's re-runnable, btw), and it should fix
> > >things up so you can un-install PBS.  However, you will need to un-install
> > >Maui FIRST.  Maui rpms depend on PBS, so they will need to be uninstalled,
> > >and re-installed.  It needs to be done in a seperate step to avoid
> > >dependency failures, as the package uninstall does not take dependencies
> > >into account for multiple package uninstalls.  :(
> > >
> > >On the plus side, there is a more recent version of Maui available in OPD
> > >as well... you can use this as an opportunity to upgrade Maui at the same
> > >time as you upgrade PBS to Torque.
> > >
> > >I've cc'd oscar-users for reference.
> > >
> > >         Jeremy
> > >
> > >At 09:21 AM 5/25/2004, Michael Edwards wrote:
> > >>After chatting with some folks at HPCS I am interested in removing PBS
> > >>and putting in either Torque or the "grid" one that Bernard Li was
> > >>packaging.  You have mentioned a bug in  the removal scripts a couple
> > >>times and I was wondering what that was.
> > >>
> > >>Thanks again for your help.
> > >>
> > >>Original Message -----------------------
> > >>At 03:41 AM 5/25/2004, Carlos Vasco Ortiz wrote:
> > >> >Hi Jeremy,
> > >> >>
> > >> >>You'll have to be more specific here... I'm not sure what you mean.
> > >> >By resource manager extensions I mean the way to bring some
> > information to
> > >> >the scheduler not covered by PBS (the resource manager). In the
> > >> >documentation about MAUI tells that this can be done via the flag -W in
> > >> >the pbs scripts, but this has to be implemented in the configuration of
> > >> >PBS. I don't know if this is already done in the OSCAR distribution
> > of PBS.
> > >> >>
> > >> >>
> > >> >>>2.- Since our cluster is quite heterogeneous in cpu speed (at least 3
> > >> >>>clock speeds) and we
> > >> >>>have 2 different switches, we would like our parallel codes been
> > >> >>>expanded along same speed cpus, all of them connected to
> > >> >>>the same switch. I now how to do one of each contraint
> separatelly, by
> > >> >>>means of the nodesets, but I don't know how to impose
> > >> >>>both contraints at the same time. It is possible to define two
> > nodesets,
> > >> >>>and to select the nodes from the intersection of both nodesets? There
> > >> >>>are other way to impose that?
> > >> >>
> > >> >>
> > >> >>I think there is a way to do this w/ PBS.  You will need to
> manually add
> > >> >>"resource" descriptions on each node, and then specify all the
> resources
> > >> >>on the job submission line (qsub) that you wish your nodes to
> match.  So
> > >> >>this basically means:
> > >> >>
> > >> >>Edit your /var/spool/pbs/server_priv/nodes file to add the resource
> > >> >>descriptions and restart the server.
> > >> >>Then, on your job submit command:
> > >> >>qsub -l nodes=6:ppn=2:clockA:switchB job_script.pbs
> > >> >>(assuming clockA and switchB are resource descriptions you've used)
> > >> >>
> > >> >About this point, I already have this solution implemented, but the
> user
> > >> >has to choose the resource he wants to run, and once you have submited
> > >> >your job, you can not use any different resorce. Sometimes you can
> need 2
> > >> >nodes, have selected the clock A, but the clock B is also OK and have
> > some
> > >> >nodes free before the clock A, so you are loosing some cicles...
> > >>
> > >>If clockB is also ok, then there should be a third group representing all
> > >>nodes that would be ok... such as clockAandB or something.  PBS resources
> > >>can certainly overlap each other.
> > >>
> > >> >  With nodesets from Maui, you can select any set,
> > >> >but all the nodes you are given are from the same set. This is what I
> > >> >need, but I have to use two nodesets at the same time, and I don't
> > know if
> > >> >this is possible.
> > >>
> > >>I can't help you much here... I'm not too familiar w/ Maui nodesets.
> > >>
> > >> >>
> > >> >>Also, if installing a new OSCAR cluster, I recommend leaving out the
> > >> >>included PBS package and using the Scalable PBS (a.k.a. Torque)
> > >> >>instead.  You can get it from OPD during installation, along w/ the
> > >> >>CluMon package.
> > >> >We have the cluster already installed and running in a production
> > >> >enviroment. We are waiting a new server (with disk) for a
> new,  different
> > >> >development cluster, so we will be able to try Torque in this case.
> > >>
> > >>If you end up trying to remove PBS on any existing cluster, please
> consult
> > >>us here first.  There is a minor bug in the uninstall scripts for it in
> > >>OSCAR 3.0.  Also, if CluMon is installed, it would need to be removed and
> > >>re-installed as well.  This is not a problem on a fresh installation with
> > >>CluMon and Torque downloaded from OPD though.
> > >>
> > >>          Jeremy
> > >>
> > >> >Thank you very much for your help,
> > >> >
> > >> >Carlos
> > >> >--
> > >> >Carlos Vasco Ortiz (ITP Tecnolog�a y M�todos)
> > >> >Tel: 34 91 207 91 21   [ITP-only internal ext.: 91 21]
> > >> >Fax: 34 91 207 94 11
> > >> ><mailto:[EMAIL PROTECTED]>mailto:[EMAIL PROTECTED]
> > >



-------------------------------------------------------
This SF.Net email is sponsored by the new InstallShield X.
>From Windows to Linux, servers to mobile, InstallShield X is the one
installation-authoring solution that does it all. Learn more and
evaluate today! http://www.installshield.com/Dev2Dev/0504
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users



-------------------------------------------------------
This SF.Net email is sponsored by the new InstallShield X.
>From Windows to Linux, servers to mobile, InstallShield X is the one
installation-authoring solution that does it all. Learn more and
evaluate today! http://www.installshield.com/Dev2Dev/0504
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to