Thanks, I think it is finally gone!
As it turns out I needed to install on the server, client, and images. I must have
"completed" the installation with the bugged version or interupted the process before
ODA could be updated at some point. Its a good thing chroot is not nearly as
complicated as I thought it was.
I had neglected to install the client and gui packages on the server, and also had
some occasional burps due to bad X11 forwarding. Remarked out the 'ForwardX11 yes'
line in /etc/ssh/ssh.conf and they went away.
Original Message -----------------------
If you started a completely fresh install, then yes you would need to run
the script again. Otherwise, if the oscar server was left intact, it's not
necessary. It's re-runnable anyway, so no risk there. Also, if you start
a fresh install, it would be best to just skip installing openpbs
altogether and get the OPD versions of clumon, torque, and Maui all
installed together.
Jeremy
At 10:09 AM 6/3/2004, Michael Edwards wrote:
>Ok, I don't think the chroot worked at all (I hope, if it did something
>and didn't finish I think I will just call it a loss and start over) so I
>installed what I thought was the expected packages on the server and
>nodes. Heres the score after I figured out how cpush worked finally.
>
>[EMAIL PROTECTED] RPMS]# rpm -qa | grep openpbs
>openpbs-oscar-mom-2.3.16-11
>openpbs-oscar-server-2.3.16-11
>openpbs-oscar-2.3.16-11
>
>--------- oscarnode001.brooks.af.mil---------
>openpbs-oscar-2.3.16-11
>openpbs-oscar-mom-2.3.16-11
>openpbs-oscar-client-2.3.16-11
>
>So now should I run the fix script again before I try to uninstall?
>
>Original Message -----------------------
>Arrgh. I had that problem once too, and I don't remember what it
>was. There are several ways to limp it along, depending on what
>happened. The uninstallation of an OSCAR package has 3 main phases:
>uninstall from image
>uninstall from clients
>uninstall from server
>
>The same uninstall script that is run on the clients is run chrooted in the
>image. If one of these phases fails, the uninstall fails. However, that
>doesn't mean that *none* of the phases worked. i.e. The image uninstall
>may have succeeded, and the others may have failed... choose your
>combination. This is where it gets kind of hairy. Even if the problem
>which caused the failures gets fixed, an uninstall attempt on a phase which
>has already been successfully uninstalled will result in a failure
>overall. This will happen perpetually when the phases get out of sync with
>each other.
>So... at that point, you have two options.
>1) Manually reinstall the rpms on the server, image, or clients...
>whichever one apparently succeeded in removal. That will sync things up so
>you can attempt to remove it again.
>2) Manually complete the uninstallation in the phases where they remain...
>and then correct the state of the database to reflect that they have been
>uninstalled. I would supply the ODA commands to accomplish this, but I
>don't know them off hand. Perhaps Neil Gorsuch (ODA author) or John Mugler
>(uninstall author) can supply the additional information here... i.e. ODA
>commands necessary to manually change a given package's database state so
>that it's no longer installed.
>
>Oh... and to find out which phase is giving you the trouble w/ uninstall...
>look at the early output in the terminal after you hit "Execute". The
>action that takes place after hitting "Cancel" is a known bug, btw.
>
> Jeremy
>
>At 03:22 PM 5/28/2004, Michael Edwards wrote:
> >Ok, heres what I did (it didn't work I don't think) after I ran the script
> >you sent me as root. The script appeared to work.
> >
> >First I downloaded maui and torque with opd. At least I think I did, but
> >they didn't show up on the Install/Uninstall button like ganglia
> >did. Could be a proxy issue, but I got the listing from opd so I don't
> >think it should be.
> >
> >Anyway, then I uninstalled maui, which appeared to work fine. Then I
> >tried to uninstall pbs but it froze up during a chroot to run the post
> >uninstall script on the image. I let it sit for like 20 min and then
> >control+c to kill the process and tried again. The maui box was unclicked
> >and black (indicating it was done I assume) and the pbs box was unclicked
> >but red (indicating it hadn't actually uninstalled yet). Torque still
> >didn't show up on the list at all. If I either executed or canceled it
> >appeared to run the same serries of post install scripts. This error
> >showed up a lot...
> >
> >Use of uninitialized value in pattern match (m//) at
> >/usr/lib/perl5/site_perl/oda.pm line 2930.
> >
> >Anyway worse comes to worse I will just backup my files and start over,
> >but I was hoping you might know what happened.
> >
> >Original Message -----------------------
> >Sorry- upon testing, it appears that the CluMon rpm also has a PBS
> >dependency, and would also need to be
> >uninstalled/re-installed. Uninstallations of packages must occur
> >seperately, but I believe multiple packages can be added at once.
> >
> > Jeremy
> >
> >At 05:38 PM 5/25/2004, Jeremy Enos wrote:
> > >Ok... so instead of writing down which changes to make, I decided just to
> > >script it to make it simpler for everyone. Just run the attached script
> > >as root on your head node (it's re-runnable, btw), and it should fix
> > >things up so you can un-install PBS. However, you will need to un-install
> > >Maui FIRST. Maui rpms depend on PBS, so they will need to be uninstalled,
> > >and re-installed. It needs to be done in a seperate step to avoid
> > >dependency failures, as the package uninstall does not take dependencies
> > >into account for multiple package uninstalls. :(
> > >
> > >On the plus side, there is a more recent version of Maui available in OPD
> > >as well... you can use this as an opportunity to upgrade Maui at the same
> > >time as you upgrade PBS to Torque.
> > >
> > >I've cc'd oscar-users for reference.
> > >
> > > Jeremy
> > >
> > >At 09:21 AM 5/25/2004, Michael Edwards wrote:
> > >>After chatting with some folks at HPCS I am interested in removing PBS
> > >>and putting in either Torque or the "grid" one that Bernard Li was
> > >>packaging. You have mentioned a bug in the removal scripts a couple
> > >>times and I was wondering what that was.
> > >>
> > >>Thanks again for your help.
> > >>
> > >>Original Message -----------------------
> > >>At 03:41 AM 5/25/2004, Carlos Vasco Ortiz wrote:
> > >> >Hi Jeremy,
> > >> >>
> > >> >>You'll have to be more specific here... I'm not sure what you mean.
> > >> >By resource manager extensions I mean the way to bring some
> > information to
> > >> >the scheduler not covered by PBS (the resource manager). In the
> > >> >documentation about MAUI tells that this can be done via the flag -W in
> > >> >the pbs scripts, but this has to be implemented in the configuration of
> > >> >PBS. I don't know if this is already done in the OSCAR distribution
> > of PBS.
> > >> >>
> > >> >>
> > >> >>>2.- Since our cluster is quite heterogeneous in cpu speed (at least 3
> > >> >>>clock speeds) and we
> > >> >>>have 2 different switches, we would like our parallel codes been
> > >> >>>expanded along same speed cpus, all of them connected to
> > >> >>>the same switch. I now how to do one of each contraint
> separatelly, by
> > >> >>>means of the nodesets, but I don't know how to impose
> > >> >>>both contraints at the same time. It is possible to define two
> > nodesets,
> > >> >>>and to select the nodes from the intersection of both nodesets? There
> > >> >>>are other way to impose that?
> > >> >>
> > >> >>
> > >> >>I think there is a way to do this w/ PBS. You will need to
> manually add
> > >> >>"resource" descriptions on each node, and then specify all the
> resources
> > >> >>on the job submission line (qsub) that you wish your nodes to
> match. So
> > >> >>this basically means:
> > >> >>
> > >> >>Edit your /var/spool/pbs/server_priv/nodes file to add the resource
> > >> >>descriptions and restart the server.
> > >> >>Then, on your job submit command:
> > >> >>qsub -l nodes=6:ppn=2:clockA:switchB job_script.pbs
> > >> >>(assuming clockA and switchB are resource descriptions you've used)
> > >> >>
> > >> >About this point, I already have this solution implemented, but the
> user
> > >> >has to choose the resource he wants to run, and once you have submited
> > >> >your job, you can not use any different resorce. Sometimes you can
> need 2
> > >> >nodes, have selected the clock A, but the clock B is also OK and have
> > some
> > >> >nodes free before the clock A, so you are loosing some cicles...
> > >>
> > >>If clockB is also ok, then there should be a third group representing all
> > >>nodes that would be ok... such as clockAandB or something. PBS resources
> > >>can certainly overlap each other.
> > >>
> > >> > With nodesets from Maui, you can select any set,
> > >> >but all the nodes you are given are from the same set. This is what I
> > >> >need, but I have to use two nodesets at the same time, and I don't
> > know if
> > >> >this is possible.
> > >>
> > >>I can't help you much here... I'm not too familiar w/ Maui nodesets.
> > >>
> > >> >>
> > >> >>Also, if installing a new OSCAR cluster, I recommend leaving out the
> > >> >>included PBS package and using the Scalable PBS (a.k.a. Torque)
> > >> >>instead. You can get it from OPD during installation, along w/ the
> > >> >>CluMon package.
> > >> >We have the cluster already installed and running in a production
> > >> >enviroment. We are waiting a new server (with disk) for a
> new, different
> > >> >development cluster, so we will be able to try Torque in this case.
> > >>
> > >>If you end up trying to remove PBS on any existing cluster, please
> consult
> > >>us here first. There is a minor bug in the uninstall scripts for it in
> > >>OSCAR 3.0. Also, if CluMon is installed, it would need to be removed and
> > >>re-installed as well. This is not a problem on a fresh installation with
> > >>CluMon and Torque downloaded from OPD though.
> > >>
> > >> Jeremy
> > >>
> > >> >Thank you very much for your help,
> > >> >
> > >> >Carlos
> > >> >--
> > >> >Carlos Vasco Ortiz (ITP Tecnolog�a y M�todos)
> > >> >Tel: 34 91 207 91 21 [ITP-only internal ext.: 91 21]
> > >> >Fax: 34 91 207 94 11
> > >> ><mailto:[EMAIL PROTECTED]>mailto:[EMAIL PROTECTED]
> > >
-------------------------------------------------------
This SF.Net email is sponsored by the new InstallShield X.
>From Windows to Linux, servers to mobile, InstallShield X is the one
installation-authoring solution that does it all. Learn more and
evaluate today! http://www.installshield.com/Dev2Dev/0504
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users
-------------------------------------------------------
This SF.Net email is sponsored by the new InstallShield X.
>From Windows to Linux, servers to mobile, InstallShield X is the one
installation-authoring solution that does it all. Learn more and
evaluate today! http://www.installshield.com/Dev2Dev/0504
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users