[slurm-dev] Re: checkpoint/restart feature in SLURM

2016-03-19 Thread Ralph Castain
I am not aware of any MPI that would allow you to relocate a process while the job is running. You have to checkpoint it, terminate it, and then restart the entire job with the new node included. > On Mar 16, 2016, at 9:58 PM, Husen R wrote: > > Dear Slurm-dev, > > > Does checkpoint/restart

[slurm-dev] ConstrainCores=no

2016-03-19 Thread Chris Rutledge
Hello Everyone, I'm having a bit of a strange problem. Basically, it appears our test instance of Slurm is constraining cores even though it is explicitly disabled in the cgroups.conf file. I'm hoping you can help me figure out why. I also can run this code on the test compute node outside of s

[slurm-dev] Re: job can not requeue after preempted

2016-03-19 Thread Benjamin Redling
On 03/17/2016 04:01, 温圣召 wrote: > The preempted job1 show a PD reason of BeginTime > my job invocation at the info of them as follow: > [root@szwg]# sbatch --gres=gpu:4 -N 1 --partition=low mybatch.sh You demand for _4_ GPUs and 1 node. Your config says each node has Gres=gpu:2 > Submitted b

[slurm-dev] job can not requeue after preempted

2016-03-19 Thread 温圣召
hi All: my job runing at a low priority partition can not be requeue when it preempted by a job at a hight priority partition。 my slurm.conf as below: # - PreemptType=preempt/partition_prio PreemptMode=REQUEUE JobRequeue=1 NodeName=cp01-sys-h

[slurm-dev] job can not requeue after preempted

2016-03-19 Thread 温圣召
hi All: my job runing at a low priority partition can not be requeue when it preempted by a job at a hight priority partition。 my slurm.conf as below: # - PreemptType=preempt/partition_prio PreemptMode=REQUEUE JobRequeue=1 NodeName=cp01-sys-h

[slurm-dev] Node features - Different OS versions

2016-03-19 Thread Clough, Ryan
We are in the process of migrating from CentOS6 to CentOS7. What we would like to do is use node features to allow our users to select which version of CentOS their jobs require. Have we selected the best way to achieve this functionality? What I worry about is users not supplying a constraint caus

[slurm-dev] Re: job can not requeue after preempted

2016-03-19 Thread Benjamin Redling
On 2016-03-16 13:54, 温圣召 wrote: > my job ... can not be requeue when it preempted ... Can you please post the job invocation too? Does the preempted job1 show a PD reason (%R) in the queue? Regards, Benjamin -- FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html vox: +49 3641 9 44323 | fax: +49

[slurm-dev] Re: ConstrainCores=no

2016-03-19 Thread Rémi Palancher
Hi Chris, Le 18/03/2016 14:05, Chris Rutledge a écrit : Hello Everyone, I'm having a bit of a strange problem. Basically, it appears our test instance of Slurm is constraining cores even though it is explicitly disabled in the cgroups.conf file. I'm hoping you can help me figure out why. Wel

[slurm-dev] Re: Node features - Different OS versions

2016-03-19 Thread Rémi Palancher
Le 18/03/2016 19:49, Clough, Ryan a écrit : We are in the process of migrating from CentOS6 to CentOS7. What we would like to do is use node features to allow our users to select which version of CentOS their jobs require. Have we selected the best way to achieve this functionality? What I worry

[slurm-dev] checkpoint/restart feature in SLURM

2016-03-19 Thread Husen R
Dear Slurm-dev, Does checkpoint/restart feature available in SLURM able to relocate MPI application from one node to another node while it is running ? For the example, I run MPI application in node A,B and C in a cluster and I want to migrate/relocate process running in node A to other node, le

[slurm-dev] Can I create workdir in taskprolog ?

2016-03-19 Thread shengzhao wen
hi ALL: I submit my job as follow. #] sbatch --workdir=/home/work/job/tmp/job20160318154722494.4157 --job-name=new_test --mail-type=ALL --mail-user=w...@baidu.com --nodes=1 --partition=low --priority=0 --time=2880 mybatch.sh and there have no /home/work/job/tmp/job20160318154722494.4157 dir

[slurm-dev] RE: checkpoint/restart feature in SLURM

2016-03-19 Thread John Hearns
O I'll we k lo Sent from my Windows Phone From: Husen R Sent: ‎17/‎03/‎2016 05:56 To: slurm-dev Subject: [slurm-dev] checkpoint/restart feature in SLURM Dear Slurm-dev, Does checkpoint/restart feature availa