Hello,
I'm sure this has been asked time and time before, only I can't find it
(search foo failling, somehow).
What's the best way to run an array job so that each task ends up on a
different node (but they run concurrently)? I don't mind other jobs
running on the nodes at the time, but
An exclusive host consumable is the right way to approach the problem. If
the task elements might be part of a parallel environment, then you'll want
to set the scaling to JOB as well.
On Wed, Apr 02, 2014 at 03:39:03PM +0100, Tina Friedrich wrote:
Hello,
I'm sure this has been asked time and
As an alternative, you could create a simple queue (onejobpernode) with 'slots
1'.
-Hugh
-Original Message-
From: users-boun...@gridengine.org [mailto:users-boun...@gridengine.org] On
Behalf Of Skylar Thompson
Sent: Wednesday, April 02, 2014 11:04 AM
To: Tina Friedrich
Cc:
If I call them with '-l exclusive' they will be the only task on that
node, though. I don't want that - I simply want only that for my job
array, no more than one slot on a node should be used. (No need to block
the rest of the node for other jobs).
But I'm thinking some sort of consumable
You can still get the consumable to work, if you don't have it set by
default. Only people who need to run this type of job would request it.
When it's requested, the host's consumable would decrement, which would
make it ineligible to run another instance of the job until the current job
is done.
...yes, I could. I'm trying to achieve this without messing with the
cluster config too much.
Still, is an option. Now I've got three and no idea which one would be
best :)
Tina
On 02/04/14 16:35, MacMullan, Hugh wrote:
As an alternative, you could create a simple queue (onejobpernode)
All,
We are running SGE 2011.11 on a CentOS 6.3 cluster.
Twice now we've had the following experience:
-- Any new jobs submitted sit in the qw state, even though there are plenty of
nodes available that could satisfy the requirements of the jobs.
-- top reveals that sge_qmaster is eating way too
We encountered a similar problem with GE 6.2u5, and it turned out to be a
bug in schedd_job_info in sched_conf. Disabling it made our problems go
away. We don't depend heavily on schedd_job_info; most of the time using
-w v with qalter or qsub is sufficient.
On Wed, Apr 02, 2014 at 06:27:07PM
Same symptoms seen at one of my clients just yesterday. Programmatic
scripts that send a small number of jobs into qsub that all use a
threaded PE or similar. Our cluster routinely runs much larger workloads
all the time.
Our sge_qmaster ran the master node out of memory and was killed hard by
On Wed, 2 Apr 2014 at 11:27am, Peskin, Eric wrote
The generated scripts themselves look like reasonable job scripts. The
only twist is using our threaded parallel environment and asking for a
range of slots. An example job is:
I hit the same thing -- see
Yep, we use slot ranges for both PE jobs and array jobs. The more complex
the job request, the more likely it was that sge_qmaster would explode.
On Wed, Apr 02, 2014 at 11:54:44AM -0700, Joshua Baker-LePain wrote:
On Wed, 2 Apr 2014 at 11:27am, Peskin, Eric wrote
The generated scripts
We're bringing up SoGE 8.1.6 and I've run into a problem with the use of a
'starter_method' that's affecting OpenMPI jobs.
Following previous discussions on the list[1], we're using the
'environment modules' package, and using a starter_method to initialize
the user's environment as if it was a
We use a starter method here, and it looks something like this:
function execute_normal_job() {
# Start the job normally with proper login shell handling
if [ $SGE_STARTER_USE_LOGIN_SHELL == true ]
then
exec -l $SGE_STARTER_SHELL_PATH -c ${@}
else
exec $SGE_STARTER_SHELL_PATH -c
Hi,
Am 02.04.2014 um 22:19 schrieb berg...@merctech.com:
We're bringing up SoGE 8.1.6 and I've run into a problem with the use of a
'starter_method' that's affecting OpenMPI jobs.
Following previous discussions on the list[1], we're using the
'environment modules' package, and using a
14 matches
Mail list logo