Re: [gridengine users] Handling job dependences

Reuti Fri, 13 May 2011 11:00:06 -0700

Am 09.05.2011 um 11:42 schrieb [email protected]:

One solution could be to run 3 VMs per exechost (and each exechost
appears
triple in SGE), but only one VM at a time might receive jobs?

Hm, I read this too quickly last time. It might or might not work tohave

several VMs, but it implies changing our test environment more that we
were planning to.

I thought of the free ESXi, which runs directly on the bare-metal. Itshouldn't put a hiogh impact on the machine.

[That would be much more convenient if you could do mutual suspension
between hosts, as we've said before.]
That's unfortunately not possible, partly because we need all thepower
from the servers to run the application without timing issues.
If it must run on bare metal, and can't run stateless (when booting
might be faster than re-provisioning), is there some reason not tohave
multiple configurations installed, and chroot into them for the job
concerned?
I guess we could do several things with our servers and how that
configurations are handled. But our hope is that we can start using GE
without changing things too much around, it would probably be a lot of
work to start using some other model.
The "Configuration" part involves starting our simulated server(runningon the real server) and setting up a simulated test environmentaround it.We must have the simulated server running between test cases to savetime.
And the main problem seem general enough to have a solution:
Several jobs depend on some common precondition, that is tied to an
execution host. You want to do the job to fulfil the precondition insuch
a way so that you minimize the work. And you want it to happen
automatically.
But if we must provide some custom logic to do this, what would be agood
strategy? What do you think about this:
Make a complex on the execution hosts (or queue instances?) thatdecriesthe configuration that is active on that server. Make a timertriggeredscript that examines the jobs in the queue at a certain interval.Let thescript run the configuration procedure on the test servers when itdecides
that it is appropriate, and then update the config complex.

This would mean to know already beforehand what will be schedulednext. What about an approach Cray is using with their `aprun` commandto place jobs in the cluster:


- all jobs run local on the head node

- there you can use a custom start script or put it in the jobscriptto prepare a real node

- send the stuff to the node

As jobs run exclusive, you can just reboot the exechost the next timeto get rid of all old stuff.


-- Reuti

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Handling job dependences

Reply via email to