[slurm-dev] Re: Preemption, requeue and checkpointing?

2015-01-26 Thread Yoel Jacobsen
Some little more research: When PreemptMode=REQUEUE, the job does not start from a checkpoint even if such exists. Is there anyway to change this behavior? Further, in an attempt to overcome, I added the following commands at the beginning of the job script: if [ -z $SLURM_RESTART_COUNT ]; then

[slurm-dev] SLURM custom job information

2015-01-26 Thread Vladislav Falfushinsky
Dear, slurm dev I faced a problem, when I should write some additional custom information into slurm MySQL database. Is there anu chance to do it? I`ve looked through the documentation, but I have not found how to extend the functionality of SLURM. I`ve tried to look into developer guide, but

[slurm-dev] Re: Lock ups with NFSv4 [was: Connection Refused with job cancel]

2015-01-26 Thread Uwe Sauter
Hi all, re-configuring my cluster to use NFSv3 instead of v4 makes the situation go away. I'll leave it that way for now… Thanks for the tip, Uwe Am 19.01.2015 um 23:29 schrieb Christopher Samuel: On 19/01/15 19:46, Uwe Sauter wrote: yes, going back to Scientific 6.5 make the

[slurm-dev] Extending SLURM using SPANK

2015-01-26 Thread Vladislav Falfushinsky
I am trying to write a spank plug-in. As an example I am using this one from documentation - https://computing.llnl.gov/linux/slurm/spank.html. When I just compile and configure as it is stated in manual the function(_renice_opt_process) that makes command argument processesing has not been

[slurm-dev] JOB SCHEDULING STRATEGIES FOR PARALLEL PROCESSING: CFP

2015-01-26 Thread jette
19th WORKSHOP ON JOB SCHEDULING STRATEGIES FOR PARALLEL PROCESSING (JSSPP'2015) in conjunction with IPDPS 2015 Hyderabad, India, May 25-29, 2015 http://www.cs.huji.ac.il/~feit/parsched/ == EXTENDED DEADLINE: February 1st, 2015 ==

[slurm-dev] Re: Signals never received by job processes

2015-01-26 Thread Michael Gutteridge
FWIW: I'm able to send, trap, and process signals as one would expect. Not sure if our versions match up (I'm on 14.03.7), but a basic bash script using trap is receiving and processing those signals properly when I use --batch in scancel. It also seems to work properly when I have --signal=

[slurm-dev] Re: Signals never received by job processes

2015-01-26 Thread Trey Dockendorf
Michael, thanks for the info, I wasn't aware of a bug that effected this. Once I used srun for my actual python script the signal catching worked as expected. I was also using the long name for bash traps so that likely is why I never saw things caught when using --signal=B:USR1. - Trey

[slurm-dev] slurmctld segfault

2015-01-26 Thread Franco Broi
Not sure if this has already been reported and fixed. It was being caused by a single queued job which I cancelled. Resubmitted and it ran ok. slurm 14.03.6 Program terminated with signal 11, Segmentation fault. #0 0x00541d90 in _job_alloc (job_gres_list=value optimized out,