Hi all,
Today a single user submitted 7000 jobs and squeue and scancel returns the
error message: Insane Message Length.
I have read on a previous topic in slurm devel
I have a group here that wants to submit a ton of jobs to the queue, but
want to restrict how many they have running at any given time so that
they don't torch their fileserver. They were using bgmod -L in LSF to
do this, but they were wondering if there was a similar way in SLURM to
do so.
2013/6/19 Paul Edmon ped...@cfa.harvard.edu:
I have a group here that wants to submit a ton of jobs to the queue, but
want to restrict how many they have running at any given time so that
they don't torch their fileserver. They were using bgmod -L in LSF to
do this, but they were wondering
Could you just create a dedicated queue for those jobs, and then configure its
priority and max simultaneous settings? Then all they would have to do is
ensure they submit those jobs to that queue.
On Jun 19, 2013, at 8:36 AM, Paul Edmon ped...@cfa.harvard.edu wrote:
I have a group here
Sounds like something you would use a QOS for. That way you get all the
limits from accounting but only applies to certain jobs.
On 06/19/13 09:03, Ralph Castain wrote:
Could you just create a dedicated queue for those jobs, and then configure
its priority and max simultaneous settings?
On 06/19/2013 10:36 AM, Paul Edmon wrote:
I have a group here that wants to submit a ton of jobs to the queue, but
want to restrict how many they have running at any given time so that
they don't torch their fileserver.
The licenses feature might work OK for this. Create a license for the
Paul,
We were discussing this yesterday due to a user not limiting the amount
of jobs hammering our storage. A QOS with a GrpJobs limit sounds like
the best approach for both us and you.
Ryan
On 06/19/2013 09:36 AM, Paul Edmon wrote:
I have a group here that wants to submit a ton of jobs
Hi,
I've tried to look for this, but is there any way to have automatic job
resubmission in case it fails. We occasionally have hiccups for random nodes
where a job might fail due to temporary network loss or loss of storage mount
or what not and when users send thousands of jobs and say 0.1%
Thanks for the input. Can GrpJobs be modified from the user side?
-Paul Edmon-
On 06/19/2013 12:15 PM, Ryan Cox wrote:
Paul,
We were discussing this yesterday due to a user not limiting the amount
of jobs hammering our storage. A QOS with a GrpJobs limit sounds like
the best approach
I second that! Sounds like the correct approach for data intensive
computing.
Thanks
Eva
--
University of California, San Diego
SDSC, MC 0505
9500 Gilman Drive
La Jolla, Ca 92093-0505 Web : http://www.sdsc.edu/~hocks
(858) 822-0954email: ho...@sdsc.edu
One note: Only batch jobs will be requeued. We can't do much for jobs
initiated by salloc or srun.
Quoting Aaron Knister aaron.knis...@gmail.com:
Hi Mario,
SLURM can and will, I believe by default, resubmit jobs that fail
due to node failures recognized by slurmctld that put the node
Okay, thanks.
-Paul Edmon-
On 06/19/2013 04:32 PM, Ryan Cox wrote:
Not that I'm aware of. I don't know of a way to give users control over
a QOS like you can do with account coordinators for accounts.
Ryan
On 06/19/2013 10:55 AM, Paul Edmon wrote:
Thanks for the input. Can GrpJobs be
Not that I'm aware of. I don't know of a way to give users control over
a QOS like you can do with account coordinators for accounts.
Ryan
On 06/19/2013 10:55 AM, Paul Edmon wrote:
Thanks for the input. Can GrpJobs be modified from the user side?
-Paul Edmon-
On 06/19/2013 12:15 PM,
13 matches
Mail list logo