Some little more research:
When PreemptMode=REQUEUE, the job does not start from a checkpoint even if
such exists. Is there anyway to change this behavior?
Further, in an attempt to overcome, I added the following commands at the
beginning of the job script:
if [ -z $SLURM_RESTART_COUNT ]; then
Dear, slurm dev
I faced a problem, when I should write some additional custom information
into slurm MySQL database. Is there anu chance to do it? I`ve looked
through the documentation, but I have not found how to extend the
functionality of SLURM. I`ve tried to look into developer guide, but
Hi all,
re-configuring my cluster to use NFSv3 instead of v4 makes the situation
go away. I'll leave it that way for now…
Thanks for the tip,
Uwe
Am 19.01.2015 um 23:29 schrieb Christopher Samuel:
On 19/01/15 19:46, Uwe Sauter wrote:
yes, going back to Scientific 6.5 make the
I am trying to write a spank plug-in. As an example I am using this one
from documentation - https://computing.llnl.gov/linux/slurm/spank.html.
When I just compile and configure as it is stated in manual the
function(_renice_opt_process) that makes command argument processesing has
not been
19th WORKSHOP ON JOB SCHEDULING STRATEGIES FOR PARALLEL PROCESSING
(JSSPP'2015)
in conjunction with IPDPS 2015
Hyderabad, India, May 25-29, 2015
http://www.cs.huji.ac.il/~feit/parsched/
==
EXTENDED DEADLINE: February 1st, 2015
==
FWIW: I'm able to send, trap, and process signals as one would expect. Not
sure if our versions match up (I'm on 14.03.7), but a basic bash script
using trap is receiving and processing those signals properly when I use
--batch in scancel. It also seems to work properly when I have
--signal=
Michael, thanks for the info, I wasn't aware of a bug that effected this.
Once I used srun for my actual python script the signal catching worked as
expected. I was also using the long name for bash traps so that likely is
why I never saw things caught when using --signal=B:USR1.
- Trey
Not sure if this has already been reported and fixed. It was being
caused by a single queued job which I cancelled. Resubmitted and it ran
ok.
slurm 14.03.6
Program terminated with signal 11, Segmentation fault.
#0 0x00541d90 in _job_alloc (job_gres_list=value optimized out,