Am 09.05.2011 um 11:42 schrieb [email protected]:
One solution could be to run 3 VMs per exechost (and each exechost
appears
triple in SGE), but only one VM at a time might receive jobs?
Hm, I read this too quickly last time. It might or might not work to
have
several VMs, but it implies changing our test environment more that we
were planning to.
I thought of the free ESXi, which runs directly on the bare-metal. It
shouldn't put a hiogh impact on the machine.
[That would be much more convenient if you could do mutual suspension
between hosts, as we've said before.]
That's unfortunately not possible, partly because we need all the
power
from the servers to run the application without timing issues.
If it must run on bare metal, and can't run stateless (when booting
might be faster than re-provisioning), is there some reason not to
have
multiple configurations installed, and chroot into them for the job
concerned?
I guess we could do several things with our servers and how that
configurations are handled. But our hope is that we can start using GE
without changing things too much around, it would probably be a lot of
work to start using some other model.
The "Configuration" part involves starting our simulated server
(running
on the real server) and setting up a simulated test environment
around it.
We must have the simulated server running between test cases to save
time.
And the main problem seem general enough to have a solution:
Several jobs depend on some common precondition, that is tied to an
execution host. You want to do the job to fulfil the precondition in
such
a way so that you minimize the work. And you want it to happen
automatically.
But if we must provide some custom logic to do this, what would be a
good
strategy? What do you think about this:
Make a complex on the execution hosts (or queue instances?) that
decries
the configuration that is active on that server. Make a timer
triggered
script that examines the jobs in the queue at a certain interval.
Let the
script run the configuration procedure on the test servers when it
decides
that it is appropriate, and then update the config complex.
This would mean to know already beforehand what will be scheduled
next. What about an approach Cray is using with their `aprun` command
to place jobs in the cluster:
- all jobs run local on the head node
- there you can use a custom start script or put it in the jobscript
to prepare a real node
- send the stuff to the node
As jobs run exclusive, you can just reboot the exechost the next time
to get rid of all old stuff.
-- Reuti
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users